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Preface 


The purpose of this book is to introduce mathematical statistics to 
readers with good undergraduate backgrounds in mathematics. No 
previous knowledge of probability or statistics is assumed on the part of 
the reader, although having had one or more good undergraduate courses 
in these subjects certainly will be found useful. The book has been 
prepared mainly from material which has been successively revised and 
presented to graduate students at Princeton University since World War II. 
Ai) early version of some of the material was issued in 1943 in lithoprinted 
form by the Princeton University Press under the title: Mathematical 
Statistics, 

The field of mathematical statistics and its applications has been growing 
at a spectacular rate for more than a quarter of a century—a rate which 
now results in a flow of new material with which no single individual can 
keep pace. Although most of the research results in the field during this 
period have appeared in some half dozen specialized journals, substantial 
numbers of important papers have been published and continue to be 
published in many scattered scientific journals. 

No attempt has been made here to write a comprehensive treatment of 
the main results in this body of literature. Instead, I have made a selection 
of basic material in mathematical statistics in accordance with my own 
preferences and prejudices, with inclinations toward trying to make a 
unified and systematic presentation of classical results of mathematical 
statistics, together with some of the more important contemporary results, 
in a framework of modern probability theory, without going into too many 
ramifications. It is therefore inevitable that some topics will be considered 
to have been slighted or inexcusably omitted, and understandably so, by 
enthusiasts and specialists on those topics. On the other hand, the reader 
who may become interested in topics lightly covered or barely mentioned 
will find references for further reading. Indeed, it is my hope that any 
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mathematically qualified reader without previous knowledge of mathe¬ 
matical statistics who studies this book and becomes interested in 
further systematic study of the subject, or parts of it, will be able to con- ^ 
tinue such study in the literature with the guidance, initially at least, of the 
references cited throughout the book and listed at the end. 

More than four hundred problems are included at the ends of the various 
chapters of the book to provide the reader with opportunities to increase 
his understanding and facility with the substance of mathematical statistics, 
j^any of these problems also serve as vehicles for introducing additional 
topics and interesting results in brief form which could not be discussed 
in detail in the book on account of space limitations. 

Some readers will wonder why more discussion was not interwoven 
between the mathematical results in the book and the statistical method¬ 
ology which rests upon these results. Discussion of this kind has been kept 
deliberately at a minimal level. Consideration of statistical methodology 
and its applications is no less important than the treatment of the under¬ 
lying mathematical theory of this methodology. Experience indicates, 
however, that both aspects of statistics, that is, the rnathematical theory and 
the statistical methodology based on this theory, are most effectively 
combined in research papers, monographs, and books restricted to specific 
topics. In a fairly comprehensive book on mathematical statistics such 
as this it is my conviction that it would be most unwise to attempt to deal 
with both aspects of each topic with equal emphasis. A careful presenta¬ 
tion of basic mathematical statistics and the underlying mathematical 
theory of a wide variety of topics in statistics, with just enough discussion 
and examples to clarify the basic concepts, such as that attempted here, is 
a much more feasible undertaking. I believe this approach to be pre¬ 
requisite to a fuller understanding of statistical methodology, not to 
mention the one I find most satisfying. 

Modern mathematical statistics depends heavily on the theory of proba¬ 
bility. An attempt has been made therefore to set this entire treatment 
onto an adequate foundation in modern probability theory without 
actually constructing the foundation. That would be a task beyond the 
scope of this book and also unnecessary, since several excellent books 
already exist on the subject which/ are referred to at appropriate points 
throughout the book in case the reader may wish to do some further reading 
in probability theory. 

In a book of this kind which covers such a span of topics, the problem of 
devising uniform notation and terminology throughout the book has not 
been easy. Some of the notation and terms will look unfamiliar to mathe¬ 
matical statisticians, but this is due partly to the price which has to be paid 
for consistency and partly to the sheer need for introducing new terms and 
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notation in connection with topics which have never been treated systema¬ 
tically in the literature of mathematical statistics. 

First drafts of most parts of this book concerning order statistics and 
nonparametric statistical inference were written when I was a Fulbright 
research scholar at Cambridge University in the spring and summer of 
1951, at which time I planned to write a monograph on that subject. 
The opportunity for research and writing provided by that appointment 
is gratefully acknowledged. It was later decided, however, to incorporate 
that material into the present more comprehensive book on mathematical 
statistics. This more ambitious project could not have been undertaken 
without support such as that which has been provided by the Office of 
Naval Research. I take this occasion to express my deep appreciation for 
this support. 

I have had the benefit of valuable advice and criticism of colleagues, 
former research associates, and graduate students at Princeton throughout 
the ^preparation of this book. A wide range of comments and advice 
generously given by J. W. Tukey and discussions with F. J. Anscombe 
have been especially useful. Criticism and suggestions by V. S. Varadarajan 
concerning some basic points in probability theory were very helpful. The 
author is indebted to D. M. Brown, D. A. Freedman, I. Guttman, A. T. 
James, and F. M. Sand for reading major portions of the manuscript and 
suggesting improvements. Thanks also go to D. R. Brillinger, who worked 
through more than three hundred and fifty problems contained in the 
manuscript before the final fifty-odd problems were added, and to J. A. 
Hartigan who read proof and weeded out numerous errors and mistakes 
and otherwise improved the final product. The advice of these colleagues 
and associates and of many other friends was always highly valued even 
though some of it was not accepted. 1 alone am responsible for errors and 
inaccuracies which remain and I shall appreciate having them called to my 
attention by readers who discover them. Finally, 1 express my appreciation 
to Mrs. Rebecca Werkman and to Mrs. Emily Sorenson for their diligence 
in typing and for their patience in photostating the manuscript. 

Samuel S. Wilks 

November, 1961 

Remarks Concerning Second Printing 

An attempt has been made in this second printing to correct various 
errors and inaccuracies in the first printing which have been called to 
my attention by several readers. I would like to especially acknowledge 
my thanks to D. R. Cox, P. C. Fishburn, I. Guttman, E. J. Hannan, 
W. Hoeffding, S. Kullback, M. Kupperman and G. P. Patil for the errors 
they pointed out. 


April, 1963 


Samuel S. Wilks 
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CHAPTER 1 


Preliminaries 


1.1 SAMPLE SPACES AND EVENTS 

Mathematical statistics is founded on the theory of probability, which, 
in turn, depends on the theory of measure and integration for its precise 
description and treatment. The measure-theoretic description of proba¬ 
bility, which will be used in this book, was formulated by Kolmogorov 
(1933a). It will be sufficient for our purposes to present here an introduction 
covering only the basic definitions, concepts, and machinery of probability 
theory in a form useful for mathematical statistics. The reader interested 
in a more detailed and comprehensive account of the theory of probability 
should refer to books by Doob (1953), Gnedenko and Kolmogorov 
(1954), Feller (1957), Kolmogorov (1933a), L6vy(1925,1937), and Loive 
(1955). For similar information concerning measure theory he is referred 
to Halmos (1950) and Munroe (1953). 

In presenting the essentials of probability theory needed in this book, 
we must first deal with the notions of sample space and event, description 
of events, and combination and decomposition of events. The theory of 
sets furnishes the machinery for handling these concepts. 

We shall denote by /? a set of elements e which will be called sample 
points or event points or more briefly points. The number of sample points 
may be finite or infinite. R is called the sample space or outcome space. We 
shall use the former of these two terms. A sample point may be thought 
of as a possible outcome of a trial, experiment, or operation, performed 
under a given set of conditions, although we shall make no attempt to 
define these terms formally. The sample space R is simply the set of all 
possible outcomes which could be realized when an operation is performed 
under the given set of conditions. 

Examples. Some illustrative examples may help fix the ideas. In tossing a 
»in once where e represents an arbitrary face turning up, the sample space R 

1 
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consists of two sample points: Head, TaiL In tossing a coin twice where e 
represents an arbitrary combination of faces turning up, the sample space R 
consists of four sample points which may be abbreviated as HH, HT, TH, TT, 
Thus, if a coin is actually tossed twice, one of these four sample points is realized. 

In dealing a single hand of 13 bridge cards where e represents an arbitrary 


bridge hand, the sample space R consists of 



sample points, each sample 


point being one of the possible hands. Thus, if a hand of 13 cards is actually 
dealt, one of the j sample points in R is realized. 


If a light bulb is allowed to burn continuously until it “expires” and e represents 
the possible length of time the bulb burns, then ^ is a positive number and the 
sample space R (idealized) consists of all positive numbers. If k light bulbs 
B\, • • • » ^re allowed to burn continuously until they all “expire,” then e 

may be taken as a set of k positive numbers • • •» ^k) representing the 

possible burning lives of Bg* • • • > and the sample space R (idealized) 
consists of the points in ^-dimensional Euclidean space whose coordinates are 
all positive. 

In all of these examples it should be noted that an operation is performed 
and the result of the operation is described by means of a sample point e which 
in turn belongs to a sample space R. 


A sample point e is sometimes called an elemen tar^y evenj^ and a set E 
of sample points, which is a subset of the points in R, is called an jvent. 
An event, of course, may consist of only one point. When we say that an 
event E occurs, we mean that the sample point e (representing the outcome 
of the experiment, trial, or operation) is contained in E. In general, there 
is more interest in events and classes of events than there is in individual 
sample points as such. Or to state the matter a little more precisely: We 
are usually more interested in probabilities associated with events than 
probabilities associated with individual sample points. The only events 
we are interested in are measurable events^ is, sets having associated 
probabilities. The notion of measurability will be discussed in Section 1.4. 

In set theory terminology, we shall be interested in certain classes of 
subsets of R and probabilities (set functions) to be defined on the sets 
belonging to such classes. A set of points in the sample space R, however, 
is also called an event. Actually, it will be convenient to use the terms 
"^set"' and "^event"" interchangeably. 

Before proceeding further in the discussion of these classes we shall 
introduce the basic principles of the algebra of sets. 


1.2 DEFINITIONS AND RULES FOR COMBINING AND 
DECOMPOSING EVENTS 

In this section, each of the sets E, Ei, • • • referred to will be events 
consisting of sample points in a sample space R. Such sets are sometimes 
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referred to as e sets. A finite sequence of n sets will be written as . E^, 

and a countably infinite sequence as Ei, E^, _ 

If e is an element of E, we denote this by writing 

(1.2.1) eeE. 

We also say that e belongs to E. 

If E^ and E^ are two sets such that every point in Ey is contained in 
E^, then Ei is called a subset of £2 and we write 

(1.2.2) £1 c E 2 or £2 El. 

Note that if £1 consists of exactly one point, say e, then Ei E^ is 
equivalent to the statement e e E^. 

Two sets, El and £ 2 , are equal if £, <= £2 and in which case we 

write 

(1.2.3) £*1 = ^2. 

The empty set (or null set) is the set which contains no points. If E 
contains no points, we write 

(1.2.4) £=(|>. 

The null set (j) is a subset of every set E R. 

The aet of all points contained in both E^ and E^ is called the intersection 
or product of E^ and E 2 and will be denoted by 

(1.2.5) E ^ nE ^, 
sometimes read “£1 cap £ 2 -” 

The sets E^ and £2 are disjoint, that is, contain no common points, if 

(1.2.6) £1 n£2 = (|>. 

The intersection of a sequence of sets £ 1 , £ 2 ,... is denoted by 

a*l 

if there are n of these sets, and by 

(1.2.8) n E , 

a-l 

if the sequence is denumerably infinite. More generally, for an arbitrary 
collection of sets {£„ : a e T} the intersection of the sets is denoted by f) E^, 

ae T 

or more briefly by f) E^ if the range T of values of a is clear from the text. 

a 

T can be, of course, not only a set of integers, but elements in a more 
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general space. In nearly all instances with which we shall be concerned in 
thid book T will be a sequence (a finite or denumerable set of integers). 
If r> El, are pairwise or mutually disjoint. 

The set of all points contained in at least one of the sets Ei and £2 is 
called the union or sum of Ei and E^ and will be denoted by"* 

(1.2.9) EiUEi, 

sometimes read “Ei cup £’ 2 -” Note that if Ei is any e set and if £2 = (j), 
then £1 U £2 = El. 

The union of a finite sequence of sets Ei,...,E„ is denoted by 

( 1 . 2 . 10 ) 

a«l 

and if n is denumerably infinite, we write 

( 1 . 2 . 11 ) UE., 

a-1 

or for an arbitrary collection of sets {E^ : ole T} we may write (J 

a 

The set consisting of all points of £1 not contained in £2 is called the 
difference between Ei and £2 and is written as 

( 1 . 2 . 12 ) Ei-E^. 

If El <= £ 2 , t hen clearly Ei — £2 is t he empty se t. If £1 n £2 = <j>, then 
£2 — £2 = El. If £2 ^ El, El — £ 2 ~is called the proper difference of 
El and £ 2 . In this case it is sometimes convenient to say that Ei — E^ 
is obtained by subtracting E^from Ei. 

The proper difference £j — £2 is sometimes called the complement of 
£2 relative to Ei and we write 

(1.2.13) £1 - £2 = E^{El). 

In the special but important case where Ei is the entire sample space R 
then £ — £2 is called the complement of £2 and is written 

(1.2.14) R-E^=^ £ 2 - 
More generally, we have 

(1.2.15) £i - £2 = £1 n £ 2 . 

To illustrate schematically the notions of the union, intersection, and 
difference of sets, we use a Verm diagrdm. Let the sample space R be 

* The union (sum) £1 u £• is sometimes written as £1 + £| and the intersection 
(produ^) £1 n £• as £1 • £^. 
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represented by the set of points inside the rectangle in Fig. 1.1 and let 
and £2 be the sets of points inside the large and small circles respectively. 
Then the union U is the set of points enclosed by at least one of the 
circles; the intersection E^ n E^ is the set of points inside both circles; 
the difference £1 — £2 (= ^1 ^ ^ 2 ) is the set of points inside the larger 
circle but not the smaller; £2 — £i(= £1 n £ 2 ) has a similar inter¬ 
pretation ; El Ci £ 2 = R — {E^KJ £ 2 ) is the set of points inside the 
rectangle but not contained within either circle. 



Fig. 1.1 Venn diagram illustrating the four basic disjoint sets £| n £|, Eir\E%% 
El n Ea, and Ei n E^ generated by Ex and £ 2 . 

These set theory notions have the following interpretations in the 
language of events:* 

(i) El c: £ 2 , means that the occurrence of £1 implies the occurrence of 
£ 2 , that is, a sample point e cannot occur in £1 without occurring in 

£ 2 . 

(ii) £=<]), the empty set, means that event £ cannot occur. 

(iii) £ = £, the entire sample space, means that event £ must occur. 

(iv) The intersection Ei n E^ is the event consisting of the joint 
occurrence of events Ei and £ 2 . 

(v) The union £1 U £2 is the event consisting of the occurrence of 
at least one of the events Ei and £ 2 . 

(vi) The difference £1 — £2 is the event denoting occurrence of Ei 
but not £ 2 . 

• It should be noted that all the definitions and operations (1.2.1) through (1.2.IS) 
apply not only to sample points and sets of sample points (events) but also to elements 
and sets of elements of any kind. The elements may be point sets, people, automobiles, 
etc. For instance, if is a collection of sets containing the set £, we write £ e if 
every set in the collection ^ is contained in the collection Sf, we write ^ and so on. 




6 


MATHEMATICAL STATISTICS 


The reader can easily verify that the operations of taking the union and 
product of sets are commutative, that is, 

(112.16) Ei^J E^KJ El, El n Ei E^ r\ Ei 

are associative, that is, 

ft T 17’v (-^1 -^2) ^ ^3 — £-1 ^ {.Ez E^ 

V1.Z.1 ,) ^ n £3 = £1 n (£3 n £3) 

and distributive, that is, 

( 1 . 2 . 18 ) El n (£3 u £3) = (£1 n £3) u (£1 n £3). 

It is also evident that £, and the proper difference £3 — (£1 O £3) are 
disjoint and that 

( 1 . 2 . 19 ) £1 U £3 = £1 u [£3 - (£j n £3)]. 

Similarly, Ei n £3 and the proper differences Ei — (Ei n £3) and 
£3 — (El n £3) are disjoint such that 

£1 u £3 = (El n £3) u [El - (£1 n £3)] u [£3 - (£1 n £3)]. 

It should be observed that the intersection of any two subsets Ei and 
£3 of /? can be expressed in terms of unions and differences of sets by the 
following formula: 

(1.2.20) £1 n £3 = £ - (£1 u £3). 

Similarly, we have for the union 

(1.2.21) £1 U £3 = £ - (£1 n £3). 

We also have the following relationships: 

( 1 . 2 . 22 ) (£1 n £ 3 ) <= £■,, a = 1 , 2 , 

and 

(1.2.23) £. c:(£jU£ 3 ), a = 1 , 2 . 

It will be useM for later sections to state extensions of several of the 
preceding results to finite or infinite collections of sets. The proofs are 
straightforward and are left to the reader. 

1.2.1 For an arbitrary collection of sets {£,.: a e T), 

(1.2.20a) C\E, = R-\JE, 


(1.2.21a) 


u n 
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An extension of (1.2.19) may be stated as follows: 

1.2.2 If E^, Ez, ■.., is a sequence (finite or infinite) of sets, then E^, and 
the proper differences E^ - (E^ O Ef), £3 - [£3 n (£3 U £ 1 )], 
...,£„ — [£„ O (£„_i U • • • u £j)],.. ., are disjoint sets whose 
union is (J £,. 

Ct 

1 . 2.2 provides a simple rule for decomposing the union of a sequence of 
sets into a disjoint sequence of sets. It should also be noted that £i, 
and the differences E 2 — Ei, U , En — Ei ^ ^ E^-i, 

. . ., are disjoint sets whose union is U E^^. 

a 

If £ 1 , £ 2 » • • • is a countably infinite sequence of sets in /?, the set E* 
consisting of all points which belong to infinitely man y of t he sets in t his 
seq uence is called the superior limit of the sequence and wiFl be written as 

(1.2.24) E* = lim sup^ £„. 

Similarly, the set E^ consisting of all points which belong to all but a 
finite number of sets in the sequence is called the inferior limit of the 
sequence and is written as 

(1.2.25) E^ = lim inf« 

If £* = we say that the limit of the sequence E^, E 2 ,. •. exists and we 
denote it by 

(1.2.26) lim£a, or lim^, 

a-^oo 

For an expanding or increasing sequence, £1 cz £2 ^ ‘ ‘ evident 

that the limit of the sequence exists and is equal to the union of all sets 
{£^1 that is, 

(1.2.27) lim,£,= \J 

a 

For a contracting or decreasing sequence, that is, one for which 
£j £j r> • • • , it is seen that the limit exists and is equal to the inter¬ 
section of all sets, and we write 

(1.2.28) lim.£. = n£'.- 

a 

In the case of an arbitrary sequence of sets E^, £ 3 ,... in R, we have 
lim sup £. = n U lim inf £, == U 0 

n»la = n n-lct—n 

Expanding and contracting sequences are, by far, the most important 
sequences in probability and statistics. If £], £ 3 ,... is either an expanding 
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' ^ a contracting sequence of sets, it is convenient to call it a monotone 
sequence. Thus, the limit of a monotone sequence always exists. If the 
limit is denoted by E, then E = \J E„ or £ depending on whether 

a a 

the sequence is expanding or contracting. We shall find in later sections 
and chapters that monotone sequences of sets play a fundamental role in 
probability theory. 

1.3 FIELDS OF SETS 

As indicated earlier, we shall be more interested in certain classes of 
subsets (events) of the sample space R than in the individual sample points 
in R. In particular, we shall be concerned with fields o f sets, that is, 
classes of sets, satisfying certain rules to be stated below. 

A nonempty class ^ of sets in R is called a Boolean field of sets if it 
satisfies the following properties: 

A1 IfEethen Ee^. f' “■' 

A2 If El G 3E and E 2 £ <^9 then E^ KJ E^ ^ ^< y 

A class of sets is called a Borelfield* of sets if the following additional 
property is satisfied: 

A3 IfE^9E29-. . is a countably infinite sequence of sets belonging to 
then \J 

a 

Actually, A2 is superfluous in defining a Borel field of sets since it is 
implied by A3. It will be seen that by choosing E^ = in A2, £1 U £1 = 
R 9 and hence Rg^, Also, it will be noted that if R is chosen for E 
in Al, then E is the empty set and hence the empty set is always contained 
in a field of sets, whether Boolean or Borel. 

It follows from successive applications of A2 that the union of any finite 
number of sets belonging to a Boolean field ^ also belong to It can 
be verified from Al and A2 and (1.2.20) that the intersection of a finite 
/ or countably infinite sequence of sets in a Borel field ^ is also in 

If R contains a finite number N of (distinct) sample points, then the class 
of all possible events is finite and this class of events is clearly a Boolean 
field. Actually, the Boolean field ^ in this case consists of the following 
2^ sets (events): the empty set; all one-point sets; all two-point sets,..., 
and finally R itself (the iV-point set). If we denote by the class consisting 
of all N one-point sets, it is clearly possible to obtain any one of the finite 
sets belonging to the Boolean field by a finite number of applications of 
Al and A2. Thus, the Boolean field ^ can be generated from the initial 

* A Boolean field of sets is also called a Boolean algebra or a finitely additive class of 
sets; a Borel field of sets is also called a a-algebra or a completely additive class of sets. 
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class It should be emphasized that there are many different ways in 
which an initial class may be selected from which the Boolean field ^ 
of sets can be generated; the class of one-point sets already mentioned is 
perhaps the simplest. 

The Boolean field generated from a finite class of sets will, of course, 
contain only a finite number of events. But if R contains an infinite number 
of sample points—even a countably infinite number—a Boolean field 
provides us with an inadequate class of events for treating many problems 
of interest. 

A3 is introduced to make sure that machinery can be developed for 
dealing with wider classes of events when R contains infinitely many sample 
points. 

Now suppose is a sample space and is a non-empty class of events 
in R. Suppose ^ is any Borel field of sets which contains all the sets in 
That at least one such Borel field exists is evident, since ^ may be 
taken as the class of all possible subsets of /?, which, of course, contains 
the class Suppose ^(^o) is the class consisting of all sets which 
belong to every Borel field containing that is, ^(^o) is the inter¬ 
section of all Borel fields containing the class The sets in ^(.^o) 
satisfy Al, A2, and A3, and hence ^(^o) is a Borel field. 

Furthermore, by definition, is contained in every Borel field 

containing Consequently, it is the umque smallest Borel field con¬ 
taining is called the Borel field generated by the class 

We may summarize as follows: 

1.3.1 Let be a non-empty class of sets in R, Then, there exists a unique 
Borel field such that if ^ is any Borel field containing 

^ 3S, 

In the special but important case where the sample space R is the axis of 
real numbers R^ (the sample points e being the real numbers) and where 
^ o> which may be referred to as the initial class, is taken as the class of all 
half-open intervals* {a, b], the sets of the Borel field ^(^o) are called 
Borel sets of the real line, and their class will be denoted by 

We obtain the same Borel field if is chosen as the class of intervals of 
type (a, b), or [a, b), or [a, b], or more generally as the class of all open 
sets of or the class of all closed sets of R^- The Borel sets of R^ con¬ 
stitute a class which includes among its sets all sets which can be obtained 

* We shall use the usual convention of letting {a, b] denote the interval a <x ^ b, 
with corresponding meanings for (a, b), [a, b), and [a, b]. If x is a k-dimensional real 
vector, that is, if ..., is a point in A-dimensional Euclidean space, we shall let 
(«i. • • • > <7*; ...» or more briefly (a; b\u denote the k-dimensional interval 

ai<Xi<,bi,i^\,,,,,k, with corresponding meanings for (a\b)u, [a \and la;b]k. 
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by performing finite or countable unions, intersections, complements, and 
differences, starting with a finite or countably infinite number of intervals. 

More generally, if the sample space R is the A:-dimensional Euclidean 
space Rjf, and if the initial class <^o is taken as the class of all /^-dimensional 
half-open intervals («; i]*, the Borel field thus generated is called 

the Borel sets of Rjf. and will be denoted by There are, of course, other 
similar choices of the initial class which generate the Borel sets of Rj^, 
such as the class of intervals of type (a; 6]^, [a; or [a; 6]^. It should 
be particularly noted that the class of Borel sets in includes any set 
which can be formed by a finite or countably infinite number of unions, 
intersections, differences, or complements, starting with intervals of any 
of the types mentioned above. 

1.4 PROBABILITY MEASURE 

In the preceding sections we have been concerned with the description 
and manipulation of events. We now consider the problem of assigning 
probabilities to these events and of setting up rules for manipulating 
probabilities. A common notion of the probability of an event is that it is 
an abstraction of the idea of the relative frequency with which an event 
occurs in a sequence of trials of an experiment under “a given set of 
conditions.” Thus, suppose E is an event in the basic sample space R 
and that each time the “basic” experiment is performed the outcome 
corresponds to some sample point e in R. If the experiment is performed 
m times, let Wjg be the number of times the resulting event point e belongs 
to £, that is, the frequency with which E occurred. The relative frequency 
of E in the m trials is This ratio clearly lies on the interval [0, 1]; 

it necessarily has the value 1 if £ = /?, and 0 if £ is the empty set. If 
£i and E^ are disjoint events, the frequency with which E^ U E^ occurs is 
'”^1 + and.the relative frequency of £i U E^ is 4- rn^^jm = 

This shows that the relative frequency of the union 
of two disjoint events is equal to the sum of the relative frequencies of the 
two events. A similar expression relates the relative frequencies of any 
finite number of disjoint events to the relative frequency of their union. 

If we think of the relative frequency of an event £ in an (obviously 
unperformed!) indefinitely long series of trials of our “basic” experiment, 
we are not far from a reasonable interpretation of the probability of an 
event £. However, certain cumbersome difficulties, which need not be 
discussed here, arise if we try to-establish a theory of probability by 
strictly formalizing the idea of relative frequency, on the assumption that 
convergence properties in an indefinitely long series of trials hold. This 
approach has been considered by von Mises (1931). A simpler and perhaps 
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more fundamental formulation proposed originally by Kolmogorov (1933a) 
consists essentially of assuming that in dealing with any event of interest 
in a probability problem, numbers on the interval [0,1] can be assigned as 
probabilities of events in some initial class of relatively simple events, 
and that these initial probabilities, together with rules for determining 
probabilities of more complex events, will enable one to determine the 
probability of any event in a class of events broad enough to include that 
of \nterest in the original probability problem. In the actual assignment of 
probabilities to events, one is usually guided by an hypothesis based on 
what one expects the relative frequencies of the events to be in a large 
series of trials of the “basic” experiment. Under the Kolmogorov 
formulation, the important point is that the mathematical theory begins 
after the assignment of the probabilities. We can, of course, question 
whether the probabilities are correctly assigned in any given situation. 
A formal treatment of this question is a problem in testing statistical 
hypotheses which will be considered in later chapters. 

We formalize these ideas by defining probability measure or a probability 
distribution for a class of events. 

A set function P defined for all sets in a Boolean field and having the 
following three properties will be referred to as a probability measure on 
the Boolean field 3F \ 

B1 For every event E in 3F there is associated a real non-negative number 
P{E), called the probability of event E, P(E) is sometimes written as 
P{e e E). 

B2 IfE^.E^,,, . is a countably infinite sequence of mutually disjoint sets 
in 3F whose union is in then 

( CO \ CO 

U £. = 1P(E,). 

a=l / a-1 

B3 PiR) = 1. 

If P is a set function defined for all sets in a Borel field ^ and satisfying 

Bl, B2, and B3, P is called a probability measure on the Borel field 

00 

In this case, of course, (J E^ belongs to 3F by definition of a Borel field. 

a = l 

The triple (P, P) is called a probability space . It should be noted 
th^ B2 holds for all finite collections of disjoint sets, say Pj,..., P„, 
since Pn+i> ^n^ 2 * • • • can all be taken as the empty set <|), and P(^) = 0. 

A set function P such that P(£) is finite for every Ee^ which satisfies 
Bl (without the restriction of non-negativeness) and B2 is sometimes 
called a completely additive set function. 

We are usually interested in the minimal Borel field generated by 
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smne Boolean field in which case, of course, contains Thus, 
we may state several simple, but useful theorems concerning probabilities 
of events in which, of course, automatically hold for events in ^ 

since a probability measure on also provides probabilities for all 

sets in 

1.4.1 If E is any event in 3i{^) then P(E) + P(£) ™ 1. 

It follows at once from 1.4.1 and B3 that /’(<!>) = 0. A non-null event 
E for which P(E) = 0 is sometimes referred to as an event (or set) of zero 
probability. 

1.4.2 If El and £’* are events in if^ot E^, then 

0 < ^ = i>(£L) - i»(£i), andPiE^) > P{E^. 

1.4.3 If El and E^ are events in then P(Ei U E^ = P(E^ -|- 

P(Et - El nE^. Also P(Ei U El) = P(Ei) + P(£i) - P(Ei n E^. 

1.4.4 More generally, if Ei,... ,E„ are n sets belonging to then 

we have 

(1.4.1) = P{Ed + P{Ei - n £i) -I- • • • 

+ Pi.E„ — n [Pi u • • • U Pn-i])» 

also 

(1.4.1a) p( U P.) = iP(PJ - i P{E, n P,) -I- • • • 

\a-l / «-l /J>a-1 

+ (_l)»-ip(£i n • • • n PJ. 

Proofs of 1.4.1,1.4.2,1.4.3, and 1.4.4 are left as exercises for the reader. 
If we consider an infinite sequence of sets Pi, P^, ..., in 3I(^, then 

(1.4.1) and (1.4.1a) become, respectively, 

(1.4.1) ' 

U^^.) - Pi.Ed + E^Ei - P, n Pi) -I- P(Ps - Pa n [Pi u Pi]) -I- • • • 
and (assuming convergence of the sums on the right), 

(1.4.1a)' P(u£.)=iw)- i P{E,r\E,) 

\a-l / a-1 P>«L-1 

+ i P(P. np,np^)-. 

y>/SI>a-l 

Note thay[1.4.1) and (1.4.1a) are special cases of (1.4.1)' and (1.4.1a)' 
respectyilin'Obtained by taking E^ E^„ ... as empty sets. 

Mlo^ng theorem is of basic importance in Chapter 2: 
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1.4.5 If El, E^,... is a monotone sequence of sets in the Borel field 311^^ 
then 

liniP(£,) = p(lim£,). 

a"*oo / 

To prove 1.4.5, we consider first the case of an increasing sequence 

£i <= £* <= • • •. Then Ei, £* — £i.£« — £._i, a = 2, 3, ... are 

mutually disjoint, such that 

(1.4.2) £. = £i U (£2 - £0 U • • • U (£. - £._i). 

Taking limits as a -► 00 , we have 



(1.4.3) lim E, = El KJ (E^ - £ 1 ) U • • • 

«-»oo 

Applying B2, we have 

(1.4.4) p(lim£.) = P(Ei) + P(E^ -£,) + •• 

But 

(1.4.5) P(Ei) + £(£* - £0 + • • • 

= lim [P(£i) + P (£2 - £ 1 ) + • • • + P(E, - E,_i)] 

a-* 00 

= lim P(£i U (£4 - £ 1 ) U • • • U (£, - £._i)). 

*“♦00 

Making use of (1.4.2) we have 

lim P(£i U (£2 - £ 1 ) u • • • U (£. - £._i)) = lim P(£.). 

a-*ao a-*(X) 

Therefore, we conclude that 


(1.4.6) P (lim £,) = lim P(£.), 

\a-*co J «-»oo 

thus completing the argument for 1.4.5 for the case of an expanding 
sequence of sets. 

For a contracting sequence of events £i ^ £*2 ^ complements 

• • • form an expanding sequence of events and it follows from the 
argument just given that 

(1.4.7) p/limr,)-limP(£j. 

\a-^oo / a-*oo 
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p( lim (R - £.)) = lim P(R - £.). 

\at-»oo / «-*ao 


which may be written as 

(1.4.9) p(r - lim £.) = lim P(R - £.). 

Using 1.4.2 this reduces to (1.4.6), where, of course, in the contracting case 

lim £, = n £«. 

*-►00 «*"1 

The following covering theorem will be useful in later sections: 

00 

1.4.6 IfE^ El, £ 2 * • • • events in such that £ c: [J E^, then 

a = l 

P(E) < ip(£.). 

a-1 

r Whenever £, Ei, £ 2 ,... is a finite or infinite sequence of sets such that 

00 00 

£ e (J we shall say that (J £, is a covering for £. 

^ a*=l a=l ' 00 

To establish 1.4.6, it is sufficient to note that since £ <= y we can 
write 


£= £ n 




(1.4.10) 

Applying 1.2.2 we find that 

(1.4.11) £ = [£ n £iHJ [£ n (£« - £2 n £i)] U • • •. 

The sets in [ ] are disjoint and belong to Hence 

(1.4.12) P(£) = P(£ n £ 1 ) + P(£ n (£i - £2 n £ 1 )) + • • •. 

But 

[£ O £1] ^ £1, (£ O (£2 £2 ^ -^i)] ^ ^2* • • • • 

Therefore, applying 1.4.2, we find 

(1.4.13) P(£) < P(£i) + PiE^ + • • • 

which concludes the argument for 1.4.6. 

By taking = £„^.2 = • • • = it will be seen that the formula in 
1.4.6 reduces to „ 

P(E)<2P(E.), 

a-1 

which wluld, of course, also hold if £, J^,..., £„ are sets in 
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RemaiiL. If the sample space R contains only a finite number of sample points 
it is evident that if one takes as an initial class of events all one-point events 
and assigns a probability to every element in then this assignment of 
probabilities determines a Boolean probability measure. This means that we can 
determine the probability of any event in the Boolean field generated by 
from the probabilities assigned to the sets in 


1.5 EXTENSION OF A PROBABILITY MEASURE 

(a) Uniqueness of Extension of Probability Measure 

It will be recalled from Section 1.3 that if one starts with any initial 
Boolean field there is a minimal Borel field containing Now 

suppose we have a probability measure on the Boolean field the 
question arises whether we can perform certain operations on the proba¬ 
bilities defined for sets in ^ so as to obtain a probability measure 
defined on without changing any of the probabilities already 

assigned to sets in 3F, This can be done. It can be regarded as extension 
(or generation) of a probability measure on ^(.^) from a probability 
measure on ^. 

Before considering whether there exists a method of extending a 
probability measure on 3F to one on we state the following theorem 

on the uniqueness of an extension: 

1.5.1 Let ^ be a Boolean field of sets and let and be probability 
measures defined on the Borel field If Px{E) = P^{E) for 

every Ee^ then Pi{E) = Pf^E)for every Ee 

If we let be the class of sets in such that for any E e 

Pf^E) = P 2 (E\ it can be verified that is a Borel field, and, of course, 
contains the (minimal) Borel field But since c ^(J^, we 

conclude that 3^ = ^(^) and hence PfJE) = PfJ£) for any E in 

Theorem 1.5.1 essentially states that there cannot be two probability 
measures defined on the Borel field generated by a Boolean field ^ 
which are equal on sets in 3F but not equal on sets in 3S{3^ — Thus, if 
we can find a way of starting with a probability measure on a Boolean 
field ^ and extending it to the Borel field then it follows from 

1.5.1 that the probability measure will be unique. 

(b) Extension of Probability Measure from ^ to ^(J^ 

Such an extension as discussed above can be achieved by using 
Caratheodory’s (1927) theory of outer measure. [Also see Halmos (1950).] 
In defining outer measure for this problem we begin with a probability 
measure P for sets in a Boolean field We then take any set fin 
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and let ... be a sequence of sets in ^ such that £<=■ U that 

00 a—1 

is, U is a covering for £ The outer measure of E, say P*{E^, is defined 

g —1 ___ ^ . 

as the greatest lower bound of ^ P{E^ for all possible coverings con- 

■ ' .. a»l ' ■ 

structed from sets in that is, 

(1.5.1) P*iE) = inf ip(£J. 

a-l 

Since for any Boolean field we can take IFi = /f, ‘ “ 4* 

as a covering for any set E in R, it is evident that P*(Ei) exists for every set 
in R. Actually, however, we shall be interested only in P*(£) for sets in 

The following properties of P* can be verified by the reader: 

(1.5.2) £*(<!,) = 0. 

(1.5.3) P*iR) = 1. 

(1.5.4) P*iE) = P(£), if £ e J*". 

(1.5.5) P*(£) < P(£), if £ c /•. 

(1.5.6) P*(£) < 2 P*(£.), if £ c U E,. 

g a 

The basic theorem for the extension of the probability measure P 
on a Boolean field to a probability measure on the Borel field 
can be stated as follows: 

1.5.2 Let P be a probability measure on a Boolean field ^. Then P*{E) 
as defined in (1.5.1) is a probability measure on such that 

P*(£) = P(£) for every Eb^. 

For proof, the reader is referred to Halmos (1950). 

1.6 STATISTICAL INDEPENDENCE 

(a) Cartesian Products of Sample Spaces 

The Cartesian product R of the sample spaces and is the set of 
all ordered pairs (e*^*, where e and 6 P‘*’. We write 

(1.6.1) R = X £'*>. 

Similarly, if £‘^’ and £**• are events in and P‘®> respectively, the 
Cartesian product of £<^’ and £<** is the event £ whose sample points 
e ■■ (e«>, e**>) have the property that 6 £‘^’ and e £<•’. We write 

(1.6.2) £ - £<w X £«>. 
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E is an empty set in R if and only if or is an empty set. Also it 
is evident that if and are events in and and are 

events inthen (£<i> x £<*>) c (/rd) x F<2))ifand only if^<i> c 
and E^^^ c and furthermore that x x ^^^0 if 

and only if and = F^^\ 

We sometimes say that E defined by (1.6.2) is the joint occurrence oi 
and E^^\ Also, it is convenient to say that is the projection of 
F onto a similar statement holding for E^^\ R}^^ and R^^^ are called 
component or marginal sample spaces of R^ If we take all points {e^^\ 
id R for which e^^^ e E^^^ we obtain a Qifinder set \n R which is, in 
fact, the Cartesian product F^^^ x R^^\ Similarly, R^^'^ x F<^^ is the 
cylinder set for which e^'^^ e F^^^ Thus, F^^^ x F^^^ F^^^ x R^^\ X 
F<2^ are events in the sample space R = Fd> x such that the Cartesian 
product F^^^ X F^2^ is the intersection of the cylinder sets F^^^ x F<*^ 
and F^^^ x F^^^ that is, 

(1.6.3) F<i> X F<2) = (Fd) X /^<2)) n X £<2)). 

We remark with special emphasis that events F in F of the Cartesian 
product type form a special class of events which play an important role 
in probability theory. 

The notion of Cartesian products extends in a straightforward manner 
to any finite or countably infinite number of events F^^\ E^^\ ... in sample 
spaces R^^\ R^^\ . .., respectively. 

Examples. As an illustration of Cartesian products of sample spaces and 
events, consider the special case where F^^ and F^^^ are sets of points 

on the axis of real numbers R^. Let Fd> be the closed interval [«!, bj\ on the 
e^Laxis; F^^^ the closed interval [a^, on the e^‘^Laxis; Fd> a closed set in F^^> 
and F^2^ a closed set in F^^^ as shown in Fig. 1.2. The Cartesian product space 
R = /?d) X /?d) consists of all points in the large rectangle ABCD including its 
boundary. The cylinder set Fd> x F^^) consists of the set of points contained in 
the cross-hatched vertical strips, and the cylinder set F^^^ x F^^^ consists of the 
set of points contained in the cross-hatched horizontal strips. The Cartesian 
product F = F^^^ x F^^^ consists of the four black rectangular sets of points. 
Thus, F is the intersection of the two cylinder sets referred to in F, that is, 
F = (F<i> X F<2)) n (F<i> x F^*)). 

It must be emphasized that the operation of taking Cartesian products of 
events can be applied to events which are more general than those represented by 
real numbers, that is, those represented by sets of points in a line, plane or in a 
multidimensional Euclidean space. For example, in playing two hands of bridge 
suppose e^^'* is the hand player A obtains on the first deal, and e^^^ is the hand he 
obtains on the second deal. Then the number of sample points e^^^ and 

comprising event spaces F<^> and F^*\ respectively, is in each case. 

/ 52 \^ \ ^ 

Furthermore, the 1^1 possible pairs of hands (e^^\ e^^^) /I could obtain oh the 
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two deals correspond to the 



sample points of the Cartesian product space 


R « X Thus, if denotes the subset of consisting of all hands 
with exactly 2 aces and the subset of R^^^ consisting of all hands of exactly 
3 kings, then E = E^^^ x £^ 2 ) event in R containing all pairs of hands 

^<2)) such that the first hand contains exactly 2 aces and the second hand 


exactly 3 kings. There are 
that is, sapiple points in £. 


[(a)(n)][(l)(r«)] 


such pairs of hands. 



Suppose R^^^ and are two sample spaces and that and 

^(^( 2 )j are the Borel fields generated by Boolean fields of sets 
and in these two sample spaces, respectively. Now consider the class 
ofsetsJ^oin£ = x R^^^oftypQE=^E^^> x £< 2 ), where£<i> 
and £<2> g Starting with as the initial class, it can be used to 

generate a minimal Borel field of sets in R which will be designated by 

(1.6.4) 

As a matter of fact it can be shown that this Borel field can also be 
generated by taking as the initial class the sets of type £^^^ x £^ 2 ) 
where £<i> e and £<2> 

(b) Products of Probability Measures 

'Now suppose and P< 2 ) ^re probability measures on and 

^(jr( 2 ))^ respectively. For any set £ = E^^^ x £<2) in where 
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^ let us assign probability to E by the 

formula 

(L6.5) P(E) = • P<2)(£(2)) 

It can be shown that P is a Boolean probability measure on which 
can be uniquely extended to 

Whenever a probability_ measure. P. on _ the Bore] field 
satisfies (1.6.5) for every set E = E^^^ x E^^^ where E^^^ e and 

£< 2 ) e ^(^( 2 )) say ti^at the component probability. spaces 

P^^O 2 ire (statistically) independent. The 

probability space ^ ~ 

( 1 . 6 . 6 ) (R,^(SP,),P) 

where P is a probability measure defined by (1.6.5) is called the Cartesian 
product of the two component probability spaces 

and (P<i>, P^^O and (P<2>, P<2)). 

The notion of statistical independence extends, of course, to more than 
two probability spaces. 


1.7 RANDOM VARIABLES 

Let (P, P) be a probability space. Let x(e) be a real, single-valued 
function defined at every sample point ^ in P such that for each real 
number b, the event in P, for which x(e) < b, belongs to The 
function x(e) is called a random variable relative to 3S. We sometimes say 
that x(e) is ^^-measurable. Note that if {P^} denotes the class of events 
in P corresponding to all real numbers b, then .^([E^]) is the Borel field 
generated by this class. It is convenient to call the set of numbers x(e) 
can take on for all e e R the sample space of x(e). 

Thus, if is the class of Borel sets on the real line Pj, the random 
variable x(e) maps the sample points e in P into sample points x in Pj 
in such a way that for every Borel set £'e there is an event Ee ^ 
consisting of all sample points in P for which j:(e) g £'. It is convenient 
to denote £ by x-\E'). By setting £'(£"') = P(x-\E')\ the probability 
space (P, P) associated with the basic sample space P can be used to 
define the probability space (Pj, P') associated with the real line P^. 
In these circumstances we may say that (P^, ^i, P') is induced from 
(P, P) by the random variable x(e). Verification of the fact that 
(Pj, ^ 1 , P') is a probability space is left as an exercise for the reader. 

If, for the given probability space (P, P), x^(e\ . . ., Xj^(e) are k 

random variables, the event Pk ?> in P of form 
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where , 6]^ are arbitrary real numbers, also belongs to and 

.6^ * < * 1 ,..., «*(e) < Furthermore, if £' is any 

set in (the Borel sets of Euclidean ^-dimensional! space the set 
of e points, say £, for which (xi(e),..., XjJie)) e E' is also contained in 
so that by making the assignment P'(E^ = £(^)> we obtain a probability 
space (Rj^, P') induced from (R, i?, P) by (xi(e ),..., a;*(e)). It is 

convenient to refer to (xi(e ),..., Xj^(e)) as a k-dimensional or vector 
random variable relative to ^ or as a ^-measurable ^-dimensional 
random variable. 

If for each sample point e in £ we have \x{e)\ < M where Af < -f- oo 
then> x{e) is called a bounded random variable, A k-dimensional random 
variable is bounded if each of its components is bounded. 


Examples. The notion of a random variable can, perhaps, be further clarified 
by a few simple illustrations. 

In considering all possible hands of 13 cards which can be dealt from a pack of 
playing cards, the set of sample points e in the basic sample space R consists of 



possible hands of 13 cards. If x{e) is a random variable which has a 


value equal to the number of aces contained in e, then x{e) is defined at every 
sample point in R and has the value 0, 1, 2, 3, or 4 bn each point in R, Thus, 
x(e) maps every sample point e 'mR into one of the points 0,1, 2, 3,4 on the real 
line jRx. Since the numter of sample points in R is finite, the class of all possible 
events in R (including R and the null set) forms a Boolean field The event in 
R for which x(e) < b, for each real number b, is clearly one of the events in the 


class If probability is assigned to each sample point in R 


**thorough*’ shuffling they are all assigned the value 1 



^in the case of 
then we have a 


Boolean probability space (i{,^,P) which is adequate for obtaining the 
probability for the event E^^{e \ x(e) < b) for any real number b. Thus, if E' 
is any (Borel) set in ^1. its probability is provided by the probability space 
(jRx, P') induced from (P, 3^^ P) by x(e). The only sets in of interest here, 
of course, are the 5-point set {0,1,2,3,4} together with its subsets, that is, the 
sample space of together with its subsets. 

If {x^{e)t x^e)) is a two-dimensional random variable where x^{e) is the number 
of aces, and xj^e) is the number of spades contained in sample point e, we haye a 
two-dinaensional random variable such that theprobabilityP(a;x(^) < bx, xj^e) < b^ 
can be computed from the Boolean probability space (P, P). By assigning 

the above probability to the interval (—qo, — oo;6x>^s] foi^ of real 

numbers (6x> b^* a probability space (Pf, PO is thus induced by the vector 
raudom variable (x^(e\ x^e)) which provides the probability of any event E in Pg. 
Again we point out that the only events £ of Pg of practical interest are those in 
the 62-point set {( 2 rx,a[^: x^ —0,1,2,3,4; x^ —0,1,..., 13,with0 <x^ ^x^ < 14 
excluding (4,0) and (0,13)}, that is, the sample space of x^e)), together 
with its subsets. 

Consider another example. Suppose two light bulbs Px ^od Pg are set to burn 
continuously and their burning lives are ti ana /g, respectively, llie basic sample 
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space R is the first quadrant of the tit2-p\2Lne and the sample points e are points 
in this quadrant. The event, say that neither bulb burns more than b units 
of time longer than the other consists of all points in R for which |^i — ^2! < 
Thus, we have a random variable x(e) defined as j/j - at each point e = (fi, /g) 
in R. The probability that x{e) < b can be computed once a probability space 
(/?, P) is set up where ^ is a Borel field containing the events \ti — /2I < ^ 
for every real 6, and P, of course, is a probability measure on The random 
variable |/i — ^2] maps the points in R into the real line Ri, so that/’X^i) ** R{Eb) 
where is the interval ( — 00 , 6] in and Et, is the set in R for which — / 2 I < 
and b is any real number. Thus, we have an initial class of events [Ej ,: b real} 
which, together with probabilities {P(£j,)}, can be used to generate a minimal 
Borel field and probability space (R, i^({Ef,}), P) from which a probability 

space (/?i, ^1, P') is induced by x{e) and which provides the probability that 
ki ”■ ^2! belongs to any set E' in ^1. 

If we let x^(e) = /i -f Ag and x^ie) = we have two random variables 

defined at every point in R. If a probability space (P, generated 

from the initial class of events : 61,62 real} and their probabilities 
where Pftjba = * ^i(^) < ^1, ^'2^ < ^2)* induces a probability 

space {R2, ^') from (/?, P) which provides the probability that 

(ti + “ ^2) belongs to any set E' in ^2- 


1.8 INTEGRATION OF RANDOM VARIABLES 

In this section we restate in terms of probability terminology and with¬ 
out proof, some basic Lebesgue-Stieltjes integration theory. Proofs can 
be found in books such as those by Halmos (1950), Loeve (1955), McShanc 
and Botts (1959), and Saks (1937). 

Suppose (R, P) is a probability space and x(e) is a random variable 
relative to If x(e) takes on only a finite number of different values 
... ,Xj^ such that for x(e) = x^ for all e e £^, f = 1,. . ., A:, then x(e) 
is called a simple random variable. In this case let us write 

(1.8.1) f x(e)dP(e) = ix,P(E,). 

Jr i-i 

More generally, if E is any set in we write 

(1.8.2) f x(e) dP(e) = i x,P{E n £,). 

Jb 1=1 

Now consider the case where x(e) is bounded and can take on infinitely 
many values. It can be shown that there exists a sequence of simple 
random variables Xy{e), X2(e), ... such that 

lim ar^(e) = x(e) 


(1.8.3) 
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uniformly for all eeR, and furthermore that for each E€ 39 

(1.8.4) lim (x,{e)dP(e) 

a-*ao Jb 

exists and this limit is independent of the particular sequence that satisfies 
(1.8.3). This limit which we denote by 

(1.8.5) jj<e)dP(e) 

is called the Lebesgue-Stieltjes integral of x{e), with respect to (/?, 39, P), 
over E. In the case of a simple random variable x(e), the Lebesgue- 
Stieltjes integral of tide) over E is given by (1.8.2).. 

In the more general case where x(e) is not necessarily bounded, but non¬ 
negative, it is integrable if there exists a sequence of bounded random 
variables x^e) < x^e) < • • • for all e e R, such that 


(1.8.6) 

lim xjie) = x{e), all e e P 

and 

flC—^ 00 

A 

(1.8.7) 

lim 1 x^e) dP(e) < -f oo. 
a— ►<» Je 


This limit is the integral of x{e) with respect to {R, P) over E and is 
denoted by 

{xie)dP{e). 

Jb 

Under these conditions it can be shown that for any Ee 39 

(1.8.8) lim I *bc(c) dP{e) 

a-*oo 

exists and does not depend on the particular sequence x^e) < x^e) < • • • 
chosen which satisfies (1.8.6) and (1.8.7). 

Finally, if x^e) is arbitrary, it is said to be integrable if it can be written 
as a difference *'(«) — **(6) where x\e) and x''{e) are non-negative inte- 
grable random variables, in which case we define 

(1.8.9) f x(e) dP(e) = f x'(e) dP{e) - | a:"(e) dP(e). 

Jb Jb Je 

It can be shown that the value of the left-hand side of (1.8.9) is independent 
of the particular choice of *'(«) and »*(c). 

The Lebesgue-Stieltjes integral has the following important properties, 
expressed in terms of random variables, where E is any set in 39. 
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1.8.1 If x{e) = k, a constant, for all e e R, 

x{e) dP(e) = kP(E). 
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1.8.2 If a < x{e) < h for all e e R, where a and h are constants, then 


aP(E) < J 4e)i/P{t') c bP(E). 

1.8.3 If and xfe) are integrable and if a and h are any constants, 
axfe) + hxjie) is integrable and 


I' 


(aa-i(e) + hx.^(e)) dP(e) 


= ^P(e) + bj^ xfe) dPie). 


1.8.4 If .i(c) xfe) for all e e R where x{e), and x-fe) 

are integrable, then 

I Xj{e)dP{e) I xU')dP{e) : I xfe)dP{e). 

Jf, Jf Jf 

1.8.5 x(e) is integrable if and only //'|t((')| is integrable, and furthermore. 


x(e)dP(e) I |.r(e)| 
Jf Jf 


1.8.6 If a-, (a), . . . is a sequence of random variables such that, for 

all e r R, 

lim xj,e) = x{e), and |.iy(')l : 2/(f). 

a " t 

where ij(e) is integrable, then x(e) is integrable and 
lini I - .r(£')| dP(e) = 0. 

In particular 

lim I xfe) dF(e) = \ x(e) dP{e) 
uniformly for all E c .id, 

1.8.7 If (Rj,, P') is the probability space induced from the probability 

space (/?, P) by the vector random variable (a’i(e), . . . , XjJie)), if 

, Xj,) is measurable with respect to :idj. {and thus , 

Xf,(e)) is measurable with respect to iid), and if E is the set in R for 
which . x,,{e)) c- £[,, where £[, c Jidj,, then we have 

-= gi^i^ • • • . ^k) dP'ixy, . .,,x„) 

JKk 

in the sense that if either integral is finite so is the other and the two 
are equal. 
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Finally, we remark that if, in 1.8.1 through 1.8.6, we replace the phrase 
“all e e K' by “all c 6 except possibly for a set F for which P{F) = 0,” 
the conclusions remain unchanged. 

1.9 CONDITIONAL PROBABILITY 

Suppose (R, P) is a probability space and let and be events in 
SS such that P(£i) > 0. Let us write 

(1.9.1) 'P(£* I £j) = ■ 

This ratio is called the conditional probability of event given that 
occurs. 

It can be verified that for any fixed E^ in ^ such that P{E^ > 0, 
(/?, P(-1 £i)) is a probability space, where P(-1 E^ is a measure which 

takes the value P^E^ | E^ on E^ e 
Note that we can rewrite (1.9.1) as follows: 

(1.9.2) P(£i n £j) = P(£i) • P{E^ I £i). 

If £(£2 I £ 1 ) = £(£ 2 ), then (1.9.2) reduces to the case of independent 
events Ei and P'29 and we have 

(1.9.2«) £(£i n £ 2 ) = £(£i)£(£ 2 ). 

More generally, it can be shown that: 

1.9.1 If El, £ 2 ,. .. is a finite or countably infinite sequence of events in ^ 
having probability measure P such that £(£i) > 0, £(£i n £ 2 ) > 0, 

£(£i n £2 n £3) > 0 ,.. . 

then 

(1.9.3) £(£i n £2 n • • •) = £(£i) • £(£ 2 1 £ 1 ) • £(£3 j £1 n £ 2 ) • • • 

•£(£„|£i n £2 n-- - n£„_i)---. 

In the case of mutual independence of £ 1 , £ 2 » • • • (1-9.3) reduces to 
(1.9.3«) P(£i n £2 n £3 n • • 0 = P(E,)P(E^)P(E^) • • •. 

Another useful result is the following: 

1.9.2 If £1, £2, • • • is a finite or countably infinite sequence of disjoint 

events in 3S having nonzero probabilities, where \JE^ = R, and 
if E is any set in SS, then “ 

(1.9.4) y' £(£) = £(£i)£(£ | £0 + P(E^P{E | £ 2 ) + • • •. \ 

Verification of this statement is left to the reader. ' ' S - 



Sec. 1.10 


PRELIMINARIES 


25 


1.10 CONDITIONAL RANDOM VARIABLES 

For a given probability space (R, P), suppose (a:i(e), x^{e)) is a two- 

dimensional random variable relative to If sets and Pg (1-9.1) 
are chosen, respectively, as the sets for which x^{e) e E[ and x^ie) e E^ 
where E[ and E^ are sets in then (1.9.1) becomes a conditional proba¬ 
bility formula concerning random variables. It gives the conditional 
probability of x^{e) e Pg given that x^(e) e E[. In particular, suppose E[ is 
a single point, say x^. If P(x^{e) = arj) > 0 there is no difficulty with (1.9.1). 
But if P{x^{e) = aji) = 0 the question arises: is there some sense in which 

we can give meaning to the conditional probability P(a: 2 (e) 6^2 | ^i(^) = ^i)*? 

In most cases of interest in mathematical statistics we shall show in 
Section 2.9 that we can give meaning to this conditional probability by 
fairly elementary considerations. But under more general conditions, an 
answer to the question is provided by the Radon-Nikodym theorem which 
may be stated as follows: 

1.10.1 If (P, 3S, P) is a probability space, and if Q is a {finite) completely 
additive set function on Si such that Q{E) = 0 for every set Ee^ 
for which P{E) = 0, then there exists a random variable g{e) such 
that 

(1.10.1) Q{E)^{g{e)dP 

Je 

for every Ee Si.. Furthermore, if g{e) and h{e) are two such random 
variables, then P{g{e) ^ h{e)) = 0. 

We shall omit the proof of this theorem, referring the reader to Halmos 
(1950) for a slightly more general formulation and proof. It should be 
particularly noted that the values of Q are not restricted to be non-negative. 

If two completely additive set functions P and Q defined on Si are 
such that Q{E) = 0 for every set E for which P(£') = 0, then Q is said to be 
absolutely continuous with respect to P. 

Suppose (R, Si, P) is a probability space and x{e) is a random variable. 
Suppose, for a given Es Si, and a fixed number x^, we want to find the 
conditional probability P{E | x{e) = x^. As we have seen, definition 

(1.9.1) becomes meaningless if P{x{e) = x^ = 0. 

Intuitively, however, we would like to define this conditional probability 
as a sort of “limit” of P{E ] x{e) e Nfx^) as a -► oo, where Nfx^, 
^ 2 (^i)>... is a sequence of neighborhoods of x^ converging to the point 
x^. In general, this limit will not exist everywhere. But it is intuitively 
plausible that if it does exist in a well-behaved domain containing P, 
where Fe Si^, one should be able to obtain P(£ | x{e) e F) by suitably 
averaging P(£ | x{e) « 4?^) over all possible x^ e P. 
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In general, it is difficult to establish such a limit as that mentioned above 
except under special conditions. However, Kolmogorov (1933a) has 
suggested the following approach. jClj, 

For any Ee ^ and Fe let ^ ^ t/ in t 

(1.10.2) Q(F) = P(E n .-HF)). ^ 

If P(x~\F)) = 0, Q(F) = 0, and it is clear that Q is a completely additive 
set function on By 1.10.1 there is a real-valued ,^i-measurable 
function f(x) such that 

(1.10.3) . Q(F)=jJix)dF(x) 

where P'(F) = P(x~\F)) for all sets F e Therefore, if we write 
g(e) = /(^(^)) we have 

(1.10.4) P(£ n x-\F)) = f g{e) dP{e) 

Jx-\F) 

andg(c), that is,/(a:(£»)), is “the conditional probability of E given x(ey\ 

It should be noted that f(x{e)) is unique in the sense of 1.10.1. From this 
definition of “conditional probability of E given x{ey' it can be shown 
that, except possibly for a set in Pj of probability zero, we have 

(1.10.5) = P(E\x(e)eNJ,x,)) 

a-* :fj 

where {A^«(^i), a = 1, 2,. . .} is a sequence of measurable neighborhoods 
of x^ which converge to x^^. 

The approach discussed above extends in a straightforward manner to the 
case of a vector random variable {x^ie ),. . ., Xj^(e)). 


PROBLEMS 

1.1 Two names are picked “at random” from a list of N different names and 
alphabetized. Describe the sample space generated by this operation and state 
how many sample points it contains. Generalize to the case where n names are 
drawn “at random” from the list of N names. 

1.2 The birthdays (month and day of month) of two persons A and B picked 
“at random” from Who's Who are recorded. Ignoring leap years, describe the 
sample space generated by this operation and state the number of sample points 
in it. Give the numbers of sample points in the following events: 

(a) and B have identical birthdays.” 

(b) “/4’s and B's birthdays afe not more than r days apart”? 

(c) “/4’s and B’s birthdays are in different months.” 

1.3 A store opens at 9 a.m. and closes at 5 p.m. A shopper taken “at random” 
walks into this store at time x and out at time // (both x and y being measured in 
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hours on the time axis with 9 a.m. as origin). Describe the sample space of (x, y). 
Describe, in terms of x and y, the following events: 

(a) ^'llie shopper is in the store less than one hour.'* 

(b) “The shopper is in the store at time 2 .” 

(c) “The shopper went into the store before time u and out after time t;.*’ 

1.4 A box of light bulbs has r bulbs (r < N) with broken filaments and a 
person tests them one by one until a defective bulb (that is, one with a broken 
filament) is found, observing only whether a bulb lights up or not when tested. 
Describe the sample space generated by this operation. How many points are in 
the sample space ? Generalize to the case where bulbs are tested one by one until 
.exactly s defectives are found. 

1.5 Let -R be the set of all students at University A. Let denote the set of 
all students in A who subscribe to magazine Af 1 , the set who subscribe to 
magazine and E^ the set who subscribe to magazine 

(a) Describe, in words, the following sets: 

£1 u u Ej; El n n £3; E — (Ej u E^\ 

£1 u u £3; El n E^ n £3; £1 n n £3. 

(b) Express in terms of operations on E, Ei, E^, E 3 the following sets: 

(i) The students who subscribe to two or more of the three magazines. 

(ii) The students who subscribe to not more than one of the three 
magazines. 

1.6 Let R be the set of all possible hands of 13 cards in an ordinary pack of 
playing cards. Let Ei, Es, E 3 , E 4 be the sets of different hands of 13 cards 
containing, respectively, the ace of spades, ace of hearts, ace of diamonds, and 
ace of clubs. Describe in words the following sets and state how many hands 
there are in each set: 

El n E^\ El - (Ei n E 3 ); (Ei n £i n E 3 ) u E 4 ; 

El vj (Ei ^ E 3 ): (El vj Ei) £ 3 ^ £1 vj £|j £1 o (^ kj £ 3 )^ 

(El u Ei u E 3 ) - E 4 ; E - [(El vj Ei) n (E 3 u £ 4 )]. 

1.7. The game of craps is played with a pair of ordinary 6 -sided dice as 
follows. If the shooter rolls 7 or 11 he wins without further throwing. If he 
rolls 2, 3, or 12 he loses without further throwing. If he rolls 4, 5, 6 , 8 , 9, or 10 
he must continue throwing until he gets a 7 or the number initially thrown. 
If 7 appears first he loses. If the point he initially threw appears he wins. 
Describe the sample space involved, and assuming true dice show that the 
probability the shooter wins is 244/495. 

1.8 If a sample space E contains N sample points, show that the total number 
of events in the Boolean field generated by these N points is 2^. 

1.9 Consider the two infinite sequences of sets E^, £^,... and E^, E^,... in 
the a^v-plane where En is the set of points for which + y* < (1 -I- n)ln and 

is the set of points for which a:* + y* < ii/(l + «). Let be the complement 
of En with respect to the entire a;y-plane, with a similar definition of £„• Describe, 
in terms of x and y, the following sets: 

lim Ea; lim lim E*; lim £«; lim E„ n E^. 
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1.10 If £i,..,, En are arbitrary events in a sample space R whose comple¬ 
ments are £i,..., show that 

u Ea and f| 

as=l a = l 

are disjoint and that their union is R, and hence that 

1.11 Events , En are such that the probability of the occurrence of 

any specified r of them is r = 1,..., /i. Show: 

(a) That the probability of the occurrence of one or more of the events 
i?i, . . . y E*71 is 



1/^2 + • • • + (-ir-' 



(b) That the probability of the occurrence of m or more of the events 
is 




+ (-!)"- 


n - 1 W»' 
OT - 1/ \n. 


(c) That the probability of the occurrence of exactly m of the events E^, 
is 

(:) (:K(:K 

1.12 A company manufacturing cornflakes puts a card numbered 1 or 2 
or,..., or r at random in each package, all numbers being equally likely to be 
drawn. If n{ > r) boxes of cornflakes are purchased, show that the probability of 
being able to assemble at least one complete set of cards from the packages is 


1 - 



... +(«l)r-l 



1 - 



1.13 If an urn has N chips numbered 1, 2,..., A and if two chips are drawn 
successively (without replacement) let e be any point in the sample space R 
generated by this operation. Let x(e) be the absolute difference between the 
numbers on the two chips which yield e. If all points e in the sample space R are 
assigned equal probabilities, show that x(e) is^ a random variable, describe its 
sample space and write down the formula for P(x{e) == x'). 


1.14 In Problem 1.4 suppose all possible sequences in which the N bulbs can 
be tested are assigned equal probabilities. What is the probability that the 5th 
defective bulb (s < r) to be found will occur with the testing of the a;th bulb 
tested? State the range of values of x for which the required probability is 
positive. 

1.15 Suppose a coin is thrown successively n times and that ^ is a sample 
point in the 2” points in the sample space R generated by this operation. Let x(e) 
be the number of heads in e. If the sample points in R are all assigned equal 
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probabilities, show that x{e) is a random variable, describe its sample space, and 
write down ^e expression for Pix(e) * x'), 

1.16 Suppose is a sample space whose elements e are the points inside the 
square with vertices (0,0), (0,1), (1, 0), and (1, 1 ) in the iit;-plane. For any 
point e having coordinates (i/, v) let x(e) ^ u If is any triangle or 
quadrilateral in R and if P(E) » area of E, show that x{e) is a random variable 
for which P(x(e) < x') can be computed for every real x\ describe its sample 
space, and write down the expression for P(x{e) < x') as a function of x\ If F 

is the event in R for which ujl <v < u, compute the value of I x{€) dP. 

r JF 

Compute the value of I x(e) dP, 

JR 

1.17 {Continuation) Let y{e) = ulv. Show that y{e) is a random variable 
such that P{y(e) < y') can be computed for every real y\ and find the expression 
for P{y{e) < y'), 

1.18 {Continuation) Show that {x{e),y{e)) is a two-dimensional random 
variable such that P{x{e) < x\ y{e) < y') can be computed for each real x* and 
y\ and write down the expression for Pi^e) < x\ y{e) < y'), 

1.19 {Continuation) Let « = 1, 2,.. . be triangles in R for which 
M>0, t?>0, w+t;<(l 4- n)lln. What is lim Show that lim F(Fn) “ 

n->oo n-^-oo 

pQ™ ^«) = Show that the infinite sequence Fj, jE^, ... satisfies formula 
(1.9.3). 


1.20 Prove 1.9.1 and 1.9.2. 

1.21 Suppose (F, ^,F) is a probability space and x{e) is a non-negative 
random variable relative to Let /a,a be the interval (a<5, (a + 1)<5), a * 0,1,2,... 
where d > 0, and the set in ^ for which x{e) s /< 5 ,a- For F e ^ let 

A{d) = fadP{EnI,-^ 

a = l 

If A{d) is finite for some value of d, show that lim A{d) exists and is the 
Lebesgue-Stieltjes integral of x{e) over F. 

1.22 Let Fi,..., F„ be arbitrary events and let ..., a^) be the proba¬ 
bility that at least m of the events Fa^,..., F^^ occur. Show that 

(A: + 1 - m)S*^i/7^(ai,. . ., a*,+i) < (/i - ...,«*) 

Af =* 1 ,..., /f — 1 , 1 < /?] < A^, where i = A:, A: + 1, denotes summation 

over all possible selections of i mtegers from 1,..., /i. (Chung (1941)). 
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Distribution Functions 


2.1 PRELIMINARY REMARKS 

As we have seen in Section 1.8 a (one-dimensional) random variable 
x{e) induces a probability space P') associated with the real 

line from the basic probability space (/?, P). Similarly, a /c-dimen- 

sional random variable {xy{e)^ . . ., xjjs:)) induces a probability space 
(•Rfcj ^ ) associated with Euclidean space from the basic probability 

space (/?, P). 

We shall usually drop the reference to event points e in in dealing 
with a random variable x{e) and call it the random variable* x. If x is 
the (one-dimensional) random variable whose probability space is 
(i?i, P') and if £' is any set in it will sometimes avoid ambiguity 

to denote the event £' by a; e £' and to use the notation P(x e £") instead 
of P\E*), In particular, if £' is the interval (a, fc], we shall understand that 
P(x E £') may be written as P{a < x c d); if £' consists of a single point 
x\ we may write P(x = x'), which is the probability that the value x' 
is realized by the random variable a; in a given trial. If a: is a ^-dimensional 
vector random variable (ar ^,, ,. ,Xj^) and if it is desirable to indicate the 
dimensionality of x to avoid ambiguity we shall denote an event £' in 
by (xi ,. *, a;*) E £', and use P{{xi,,., ,Xj^)e £') rather than 
P(xeE') or P'(E'). If £' is the /^-dimensional interval (ai,..., 

• • • > K] denoted by (a, it will be understood that PiSx^y. ., ,Xj^ 
can also be written as £(^< < a;,. < / = 1,. .., k), 

:iiiscussing a /:-dimensional random variable (a?!,. . ., Xj^), we shall 

* Ideally, it would be desirable to denote random variables by bold face letters or by 
some other characteristic marking. This is, however^ hardly practical in a book con¬ 
taining many applications of random variable theory involving many different symbols 
(some of them classical) designating random variables. It will be made clear in the text 
whenever a quantity under discussion is, in fact, a random variable. 
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sometimes find it convenient to refer to it as “the random variables 
^19 • • • 9 and sometimes as “the A:-dimensional random variable with 
components Xj,..., Xj^.” 

We shall be concerned frequently with bounded sets in set E' 

in is bounded if it is contained in some finite A:-dimensional interval 
(a, b\, that is, where ai,. . ., b^^, . , ,bj^ are all finite. If a random 

variable (x^,.. ., x^) has the property that -P((xi, ..., X;^.) g (a, 6];^) = 1, 
where (a, 6]^ is finite, then (x^,. . . , x*.) is said to be bounded with proba- 
bility 1. 


2.2 DISTRIBUTION FUNCTIONS OF ONE-DIMENSIONAL 
RANDOM VARIABLES 

Suppose X is a one-dimensional random variable whose probability 
space* is (Rj, P). We shall show that the allocation or distribution of 
probability over the sample space of x in R^ can also be described by a 
distribution function F{x) defined at each point in R^ and having certain 
properties. 

For any interval (--oo;x'] on R^ which, of course, belongs to let 
F{x') be defined as follows: 

(2.2.1) F(x') = P(-oo < X < x'). 

F(x) is clearly a single-valued, real, and non-negative function of x in R^, 
If x" > x' we have from 1.4.2 

(2.2.2) P(x") - F{x) = P(- 00 < X < x") - P(-~ 00 < X < x') 

= P(x' < X < x") > 0. 

Hence F{x) is a nondecreasing function of x. 

If we denote the interval (— oo, a] by then we have the following 
contracting sequence of sets 

^ E_^^ - • 

for which it is clear that lim E_^ = <}>. Hence we have from 1.4.5 

a—*■00 

F(- oo) = lim F(-a) = lim ?(£_,) = PUim £_ J = P(<!>) = 0, 

a->oo a->oo 'a-*oo / 

that is, 

(2.2.3) F(-oo) = 0. v/ 

* From now on, unless otherwise indicated, E (not E') will denote a set in or more 

generally in and we shall drop the dash on P. 
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Similarly, we have the expanding sequence 


£■1 c £2 c •. • 

for which lim £, = R^. Therefore from 1.4.5, we have 

a-*«oo 

F(4-oo) = lim F(a) = lim P(£«) = P|lim £,| = P(Ri) = 1, 

a-»oo «-*oo \«-*oo / 

that is. 


(2.2.4) 


F(+oo) =1. V 


Now consider a decreasing sequence of real numbers x^, x^,... such that 
lim x, = x'. Then £*^, £.,... is a contracting sequence of sets having 

a--»-oo 

£^ as their limit. Again by applying 1.4.5, we have 


that is, 
(2.2.5) 


lim Fix,) = lim P(£,.) = P lim Ej = P(£,0 = Fix') 

a-+oo a-+oo \a-*ac> / 


I 



Fix’ + 0) = Fix'). 


In other words, dropping dashes, the function Fix) is continuous on the 
right at each value of x. 

The reader should observe that (2.2.5) is purely a consequence of 
defining Fix') as the probability contained in the half-closed interval 
(— 00 , x'). If we had chosen to define Fix') as P(— 00 < x < x'), then we 
would have had F(a:' — 0) = F(x'), that is, F(a:) would have been continuous 
on the left at each value of x. Hence the definition of Fix') as the proba¬ 
bility contained in the half-closed interval (— 00 , a:'] rather than open 
interval (— 00 , x') and having consequence (2.2.5) should be viewed as a 
^convention. 

Thus, If a: is a random variable having probability space (Pj, ^ 1 , P), 
there exists a function Fix) defined by (2.2.1) (dropping the dashes), at 
every point x in Rj, and having properties (2.2.2) through (2.2.5). 

Conversely, if a function Fix) defined by (2.2.1) and having properties 
(2.2.2) through (2.2.5) is given, there exists a probability space (P^, P). 

;For we may consider as our initial class of sets in P^ the class of half¬ 
open intervals (— 00 , x] for all real numbers x, and take the probability 
assigned to (— 00 , x] as Fix). This initial class of sets generates a Borel 
field ^i^o) which is, by definition, the class Utilizing properties 
(2.2.2) through (2.2.5) of Fix), it can be shown that a unique probability 
measure can be constructed from Fix) on the Borel field generated by 
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Summarizing, we have the following result: 

2.2.1 The probability space (/?i, P) of a random variable x uniquely 
determines a single-valued^ realy and non-negative function F{x) 
defined by (2.2.1) for every point x in Ri having the following pro¬ 
perties: 


(2.2.6) {a) F(x'') - F(x') > 0, if re" > re' 

(b)F(-cyo) = 0 
(e)F(+oo)= 1 
(d)F(x-^0) = F(x), 

Conversely y afunction F{x) having these properties uniquely determines 
a probability space P) with P{E^ = F{x), 

F(x) is called the distribution function (d.f.) or the cumulative distribution 
function (c.d.f) of the (one-dimensional) random variable x. We shall 
ordinarily use the latter term. If one thinks of a total probability of 1 being 
distributed along the a;-axis then F{x) is simply the fraction of the proba¬ 
bility lying on ( — 00 , x]. 

Thus, we have two alternative ways of describing the probabilities 
associated with a (one-dimensional) random variable x. One is by means of 
a probability space {R^y 3S^y P) and the other is by means of c.d.f. F{x) 
defined at every point in Ri, satisfying the conditions expressed in (2.2.6). 
The c.d.f. description is more convenient in the analysis of random 
variables for most purposes, and will be used almost entirely from now on. 
It should be noted that if we are given a basic sample space R and a random 
variable x{e) defined on points e in P there exists a c.d.f. F{x) of this 
random variable defined by 

P{x{e) < x) = F(x). 

Conversely, suppose we are given a c.d.f. F(x) with no reference to a basic 
sample space R. We can always define a random variable x whose c.d.f. 
is F{x) by considering the real axis as the basic sample space R. The 
sample points e will then be the real numbers and fbr any given real 
number x\ we assign our random variable x(e) the value x', that is, 
ir(x') = x\ Then we have P{x{e) < x') = F(x'). 

The probability P(x e E) where E is any set in exists and can be 
determined from F(x). For instance, the probability P(x' < x < x") is 
determiaed from F(x) by formula (2.2.2). Other useful probability 
statements concerning the random variable x which can be verified by the 
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reader, using methods similar to those involved in obtaining (2.2.5), are 
the following: 

(2.2.7) P{x = x^) = F(x^) - F{x^ ~ 0) 

(2.2.8) Fix' < a: < xO = Fix'' - 0) - Fix') 

(2.2.9) Fix' <x< x") = Fix" - 0) - Fix' - 0) 

(2.2.10) Fix' <x< x") = Fix") - /'(a:' - 0). 

Note that if a; is a bounded random variable, then there exist finite num¬ 
bers a, b with a < b where a is the largest number for which Fid) = 0 and b 
is the smallest number for which Fib) = 1; 6 — a is called the range of x, 

2.3 COMMON TYPES OF ONE-DIMENSIONAL 

RANDOM VARIABLES ^ 

Most one-dimensional random variables which arise in mathematical 
statistics belong to one of two types: the discrete type and the continuous 
type. The distribution function Fix) and, of course, the probability 
measure P(E), for these two types of random variables can be defined 
in terms of alternative, if not more primitive, functions as we shall see 
presently. 

(a) The Discrete Type 

In the discrete type the c.d.f. Fix) is a step-function, that is, its value 
changes only at a finite or a countably infinite number of points 
x^^\ ... in Ri, having no finite limit point, at which jumps or saltuses of 
size pix^^^), pix ^^^),. . . occur. The saltus pix^^^) is given by il.l.l) as 

(2.3.1) p(a;^*^) = Fix = a;<*>) = F(a:(“>) - 0). 

At all other points x' in Ri, pix') = Fix = x') = 0. Fix) can be expressed in 
terms of these saltuses as follows: 

(2.3.2) Fix) = 2 

where the summation extends over all values of a for which < x. 
Letting a; -► + oo in (2.3.2) it is seen that we must have 

(2.3.3) F(+oo) = 2K*“') = 1- 

a 

Since the total probability 1 is distributed among the points x^^\ 
... i^s customary to refer to these points as prob ability £oints or 
mass p oints. This set of points comprises the sample spape of a random 
variable x having a c.d.f. given by (2.3.2). 
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In dealing with a discrete random variable x it will be clear from the 
context what the mass points of x are. Hence there will be no ambiguity 
if we drop a and simply write p(x). The function p(x) will be called the 
probability function (p.f.) of x. 

We may summarize as follows: 

2.3.1 The c.df. F{x) of a discrete random variable x is uniquely determined 
by the pf p(x) and conversely. 

In dealing with problems involving a discrete random variable, it is 
usually more convenient to work with the pS.p{x) than with the c.d.f. F{x). 




Fig. 2.2 Graph of p.f. pix) corresponding to the c.d.f. F(x) graphed in Fig. 2.1. 

The C.d.f. F{x) of a one-dimensional discrete random variable x can be 
represented graphically as the graph of a step-function whose jumps of 
magnitude • • • occur at the mass points x^^\x^^\ ,,. 

respectively, as shown in Fig. 2.1. 

The p.f. p(x) of the random variable whose c.d.f. is represented in Fig. 
2.1 can be represented graphically by vertical line segments of lengths 
p(x ^^^),... located at the mass points x^^\ x^^\ ..., respectively, 
and zero elsewhere, as shown in Fig. 2.2. 
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If F(x) has only one saltus, say occurring at the mass point 

then X is called a degenerate (one-dimensional) random variable, and 
we shall denote its c.d.f. by 


(2.3.4) 


e(x — x<^>) = 


fl 

!o 


X > x<« 
X < xO>. 


It should be noted that the c.d.f. F{x) given by (2.3.1) can be expressed 
in terms of the e-function (2.3.4) as follows: 

(2.3.5) f (x) = 2 • c(» - 

a 

Examples. Examples of one-dimensional discrete random variables are 
plentiful in element^ probability theory. For instance, if x is the random 
variable denoting the number of dots occurring in a throw of a single ‘*true*’ 
6 -sided 4 ie, then the mass points are » 1 ,..., « 6 and probabilities 

are assigned so that the p.f. is given hy p{x) = a; » 1 ,..., 6 and/K^) »» 0 for 
all other values of x. If x is the numl^r of aces occurring in a single lumd of 13 
cards dealt from a “well-shuffled** pack of 52 ordinary playing cards, then 
apd) «■ 0, — 1,..., * 4 and probabilities are assigned so that the p.f. 

is given by 



andp(a;) » 0 for all other values of x. 

Further examples of important discrete random variables will be discussed in 
Chapter 6 . 


(b) The Continuous Type 

For the continuous type of random variable there exists a Lebesgue- 
measurable function> 0 such that 


(2.3.6) 



f{y)dy. 


for all x' e R^. In this case dFfdx exists and 


(2.3.7) 


dx 


/(») 


except possibly for a set of values of x of probability 0. As a matter of 
fact a function ^x) exists which satisfies (2,3.6) and (2.3.7) if and only if 
/(x) is absohaeiy contfaiiow. and a random variable having wch 3 c.d.f.. 
i^sometimes called an oAsaluteJy continue, random variable. 

SomeQihes we have occasion to d^ with a random variable'x having 
merely a contimum c.d.f. F{x), in which case F(se) would not be assumed to 
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satisfy (2.3.6) and (2.3.7). This more general type of random variable 
should not be confused with the case in which F{x) is absolutely continuous. 
The occasions in which F{x) is merely continuous rather than absolutely 
continuous are rare, however, and it is customary to drop the adjective 
“absolutely” and in such a case say that a: is a continuous random variable. 
At any rate it will always be clear from the context of a situation which 
type of continuity is involved. It will be seen, for instance, that a good 
deal of the sampling theory underlying order statistics and nonparametric 
inference in Chapters 8, 11, and 13, holds for the case where F{x) is merely 
continuous. Consider the expression 

(2.3.8) . 

x" — x' 

The ratio (2.3.8) is non-negative and represents thejiverage probability per 
unit length contained in {x\ x"].^ If the limit of the ratio exists as x" -> x[, 
we obtain J\x) > 0 which may be thought of as a density of probability at 
iht point X = x. Accordingly, we shall call f{x) the probability density 
function (p.d.f.) of the random variable x. It may be useful to summarize 
as follows: 


2.3.2 The c.d.f F(x) of a continuous random variable x is uniquely deter- 
mined by its p.d.f. f{x) in accordance with (2.3.6). Conversely, f{x) 
is determined by F{x) in accordance with (2.3.7) except possibly for 
a set of values ^of x of probability 0. 

If (2.3.7) holds, then for any (Borel) set E on we have 


(2.3.9) 


P{x E E) 


- f f(x)dx. 
t if £ = /?!, then /(; 

JRi 


It follows at once from (2.3.9) that E = then I f{x) dx 
It is sometimes convenient to use ordinary differential notation and write 
(2.3.10) P{x <x<x^ + dx) -=f{x') dx 


understanding, of course, the usual caution which must be exercised in 
dealing with differentials. The quan tity fix) dx (dropping the dash) js 
called the probability element (p.e.) of x. If F{x) is merely continuous, it 
is still convenient to denote P{x' <x <x + dx) by dF{x') and refer to 
it as the p.e. of a; at a; = x\ 

For any number p on the interval (0, 1), the pXh quantile or fractile 
(lOOpth percentile) Xj, of the continuous random variable x having c.d.f. 
F(x) is defined by the smallest number Xj, for which 


(2.3.11) 


F(x,)^p. 
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In particular, £g g is the median of x, and x^ ^^ and x^ j^ are the lower 
quartile and upper quartile, respectively, of x. In the case of a discrete 
random variable x, if there is at least one value of x for which F{x) = p, the 
pth quantile is the smallest of such values. Thus quantiles in this case are 
defined only at the mass points of the random variable. 



Fio. 2.3 Graph of the c.d.f. Fix) of a one-dimensional continuous random variable x. 



In dealing with problems involving a continuous random variable x, it is 
usually more convenient to work with the p.d.f. f{x) than with the c.d.f. 
F(x). 

The c.d.f. F(x) of a one-dimensional continuous random variable and the 
corresponding p.d.f. f(x) are represented in Figs. 2.3 and 2.4 respectively. 
In Fig. 2.3 the value of P{x' <*<*') is represented by the difierence 
between the two ordinates F(x') and F(x*), whereas in Fig. 2.4 the same 
probability is represented by the shaded area. 

Example. We shall deal with several important special probability distribu¬ 
tions of the continuous type in Chapter 7. It may be useful, however, to give the 
foUowing simple example. Suppose the probability that a “random” point lies 
inside any circle of radius d which in turn lies within a given circle C of radius r 
is iPlr\ and we are interested in the distance of the “random” point from the 
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center of circle C. If we set up a random variable x to denote the distance 
between the “random” point and the center of the given circle and for a given 
x' define F(x') as P{x < x'\ then the c.d.f. of x (assuming the point falls in C) 
is given by 

r 1, X > r 

f = j 7?' 0 < .e < r 

< 0 

0 < X < r 
.r < 0, X > r. 

It can be shown that the most general form of F(x) is a convex combina¬ 
tion of a discrete c.d.f. Fi(x) and a continuous (not necessarily absolutely 
continuous) c.d.f. F 2 (x), that is, F(x) = aF^^ix) -h bF^ix), where a and b are 
non-negative and a + /? = 1 . 

2.4 DISTRIBUTION FUNCTIONS OF TWO-DIMENSIONAL 
RANDOM VARIABLES 

* 

(a) General Properties 

Suppose (j^i, x.^ is the two-dimensional random variable having proba¬ 
bility space (/? 2 , ^^ 2 ’ ^)- Let be the interval (—oo, —oo; Xy, arg] in 

Let 

(2.4.1) F(xl X 2 ) = = P(-oo < Xi < x[,- CO <X2< x' 2 ). 

^ 2 ) is clearly a single-valued, real, and non-negative function of 
(xi, X 2 ) in £ 2 * 

Any interval /g of the form {x[, x!^; x^, x^] belongs to ^2 since 

(2.4.2) /g = a.-) — -- £(^' 3 .')). 

Furthermore, the probability that (x^, x^) e 4 is seen to be 

(2.4.3) P{(x,, X 2 ) G I 2 ) = Fixl xj) - F(xi, x'i) - F{x'i, a:') + F(a:i, a:^). 

It will be convenient to call the expression on the right of (2.4.3) 
A|^F(a:i, X 2 ), the second difference of F{x^^ a: 2 ) over /g. We then have 

(2.4.4) P((a:i, Xg) G /g) = arg) > 0. 

Figure 2.5 relates to the various quantities in Af^F(a:i, a^g). F(a;J, x^) and 
F(a:j, X 2 ) are the probabilities contained in the infinite regions (including 
their boundaries) shaded with vertical and horizontal lines, respectively. 
E(x[, X 2 ) is the probability contained in the doubly-shaded region (including 


lo 

and the p.d.f. of x is given by 

( 2x 
1 ^’ 
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Upper find right boundaries) and Xj) is the probability contained 

in the nonshaded rectangular region (including the upper and right 
boundaries). 


*2 



Fio. 2.S Diagram relating to the formula for A',F(x,, X|). 

Now consider the sets 

■®(—oc = 1, 2,... 

which is clearly a contracting sequence of sets whose limit lim 
Therefore, it follows from 1.4.5 that *“*" 

lim F(-a, xi) = lim ) = p(lim = P(<I)) = 0. 

«-*00 «-» Q0 * ' C|-*00 / 

Hence, dropping the dash, 

(2.4.5) f(-oo,x*) = 0. 

Similarly, 

(2.4.5a) F(Xi,-oo)»0. 

Now consider the sets 

(X ™ 1,2,.... 

This is an expanding sequence such that lim « R^. Hence we have 
from 1.4.5 

lim F(a,a) = Urn F(E(,,.)) - Filim E^,^A = =* 1, 

a-* CO a-»oo \a-»oo / 

that is, 

(2.4.6) F(+oo,+oo)« 1. 

By argument similar to that used in establishing (2.2.5) it can be shown 
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that is continuous on the right in each variable; that is, at each 

point (o^i, x^ in R 2 , 

(2.4.7) F(xi + 0, X 2 ) = F{x^, X2 + 0)=^ Fix^, x^). 

From (2.4.7) it can be verified that F(x^ + 0, arg + 0) = F(x^, x^. 

Hence, if (x^, Xg) is a random variable having probability space (/? 2 , 

F), a unique function F(x^, X 2 ) exists, defined by (14.1) (dropping dashes) at 
every point Xg) in R 2 , and having properties (2.4.4) through (2.4.7). 

Conversely, as in the one-dimensional case, a function F(x-j^, arg) defined 
by (2.4.1) and having properties (2.4.4) through (2AJ) uniquely determines 
a probability space (/^g, ‘‘^ 2 * ^)* Summarizing, we have the two-di¬ 
mensional analogue of 2 . 2 . 1 : 

2.4.1 The probability space (R 2 , ^ 2 * ^ two-dimensional random 

variable x^ uniquely determines a single-valued^ real, and non¬ 
negative function F(x^, x^ defined by (2.4.1) at each point (ar^, Xg) in 
R 2 having the following properties: 

(2.4.8) {a) Af/(a?!, a^g) > 0 

ib) F(- 00 , Xg) = F(a:i, - cx)) = 0 

(c) F(+ 00 , + 00 ) = 1 

(d) F(x^ -h 0, arg) = Fix^, arg -f 0) = F(a:i, arg). 

Conversely, a function F{x^, Xg) having these properties uniquely 
determines a probability space (i^g, .^g, F) such that == 

F(xi, Xg). 

F(xi, Xg) is called the distribution function or cumulative distribution 
function of the two-dimensional random variable (x^, Xg). It is sometimes 
convenient to say that F(xi, Xg) is the c.d.f. of the two random variables, x^, 
and Xg; F(xj, Xg) is also referred to as a bivariate c.d.f.. 

As in the case of a one-dimensional random variable, we have two 
alternative schemes for describing the distribution of probabilities associ¬ 
ated with a two-dimensional random variable: a probability space and a 
c.d.f. We will use the latter almost entirely. 

A variety of two-dimensional analogues of formulas (2.2.7) through 
(2.2.10) can be set up and verified by the reader. In particular, we have 

(2.4.9) F(xi = x[, Xg = xi) = F(x;, x0 - F(x; - 0, Xg) 

- F(x;, Xg - 0) -t- F(xi - 0, x^ - 0). 

(b) Marginal Distributions 
Consider the sets 


a = 1 , 2 ,... . 
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They constitute an expanding sequence such that 

lim +oo) = 

a-»ao 

where is the event < x[ in R^. Hence by applying 1.4.5 we have 
lim F{x[, a) = lim = p(lim 

a-»oo a-*oo \a-»Qc J 

That is, 

(2.4.10) F{x[, +oo) = P(£,;) = P(x, < x[). 

Dropping the dashes, let 

(2.4.11) P(^i,+oo) = Pi(:ri). 

It can be verified by the reader that Fi(x^) satisfies all the conditions 
(2.2.6) (a) through (d) of the c.d.f. of a one-dimensional random variable. 
In fact, Pi(a;i) is the c.d.f. of the component Xj^ of the random variable 
(xi, X 2 ) and is called the marginal c.d.f. of x^, or simply the c.d.f. of x^. 
Similarly, 

(2.4.12) F.,(x.,) = F(-{-oo.x,) 

is the marginal c.d.f. of x^. If one thinks of a total probability of 1 being 
distributed in the {x^, x^) plane in accordance with the c.d.f. F(x^, x^), and 
if this probability is orthogonally projected onto the ayaxis R^{\ then 
Pi(a:i) is the amount of probability lying in (— 00 , .rj on the .r^-axis. A 
similar interpretation holds, of course, for F^J^x.,). 

It follows from 2.2.1 that the marginal c.d.f.’s F^(x^) and £ 2 (^ 2 ) determine 
probability spaces iR[^\ P^'^) and iRf \ P ^“0 respectively. 

(c) Statistically Independent Random Variables 

If {x^, x^ is a two-dimensional random variable whose probability space 
is (^ 2 ^ ^) ^rid if the components x^ and x^ are one-dimensional random 

variables having probability spaces (Pi^\ \ P^^O \ 

respectively, then 0 :^ and X 2 are said to be statistically independent (see 
Section 1.6) if for every set Pin P 2 = form x P^^^ 

where P<^^ and P^^^ are (Borel) sets in RS^"^ and Rf^ respectively, we have 

P(P) = P<i>(P<i>)*P< 2 )(p( 2 )). 

Dropping the adjective “statistically” and referring merely to inde¬ 
pendent random variables will not cause ambiguity. 
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Independence of and Xj can be more usefully expressed in terms of 
c.d.f.’s as follows: 

2.4.2 If (xj, Xj) is a random variable, having c.df. Flx^, x^) a necessary and 
sufficient condition for x^ and Xj to be independent is that 

(2.4.13) * 2 ) ~ ‘ 

where Ff^x^ and Ff^x^ are the marginal c.df.'s of x^ and x^. 

To establish 2.4.2 we denote, as usual, the sample spaces of and x, 
by and Rf^, respectively. Then 

(2.4.14) X J?P>. 

Let £■*. and E^' be the events x^ < x[ and Xj < x^ in and R^i\ 
respectively. Then 

(2.4.15) X 
If Xj and Xj are independent, we have 

(2.4.16) • i><*>(4j), 
that is, 

(2.4.17) F(x;, x0 = Fi(xO • Fa(x'). 

Conversely, suppose F(xi, Xj), Ffxj), and F2(x^ are c.d.f.’s such that 

(2.4.17) holds for every point (x^, x^ in R2, that is to say, (2.4.16) holds 
for every set Fj-ej.*;) = F,,. x F,,.. Then it follows from Section 1.6 that 
a probability space (Fj, .j'j, P) is uniquely determined in R^ which has 
the property that for any set F in of form F*^* X F***, we have 
F(F) = F<i»(F<i>) • F<2>(F<2>) where (F<ii>, F<i>) and (Rf^ F<2>) 

are probability spaces determined by Ff^x^ and Ff^x^ respectively, which 
is equivalent to the statement that x^ and x^ are independent. 


2.5 COMMON TYPES OF TWO-DIMENSIONAL 
RANDOM VARIABLES 

In probability and statistics, most two-dimensional random variables 
are of three types: discrete, continuous, and mixed, although the mixed 
type occurs much less frequently than the other two. It is worthwhile to 
discuss these in some detail. 

(a) The Discrete Type 

For a discrete random variable (x^, x^), F(xi, x^ is a step-function such 
that the right-hand side of (2.4.9) is zero except at a finite or countably 
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infinite number of mass points (a4“^ a4“^), a =» 1, 2,..., having no finite 
limit point in R^. At these mass points we have 

(2.5.1) x, = a;W) = 
and furthermore 

(2.5.2) = 

at 

Conversely, if we are given a sequence of points xf'^) such that 
> 0» a = 1, 2, , and p{x-^^ aug) = 0 for all other points in 

i? 2 , then F{xy^, x^ is the step-function defined by 

(2.5.3) 

the summation extending over all values of a for which xf^ < i = 1,2. 
The random variable (x^, x^) is degenerate if there is only one mass point 
a4'^), in which case p(x<i*>, x^*') = 1. We may write 


(2.5.4) 


f (Xi, Xa) = £(xi - 


!»•«> V 
» ^2 



Xi > x^^\ i = 1, 2 

otherwise. 


Of course, it is possible for only one of the components of (a?!, Xg), say x^, 
to be a degenerate random variable with a c.d.f. as defined in (2.3.4). 
Unless otherwise indicated we shall assume that neither component of 
(a?!, OTg) is degenerate. 

There will be no ambiguity if we drop a and call p{x^y x^ the probability 
function (p.f.) of (xi, x^, 

'* Then we have the result that 


2.5.1 The c.df Fix^, x^ of a discrete random variable x^ is uniquely 
determined by the pf p{x^, Xg)* conversely. 

In dealing with a discrete random variable (x^, Xg) it is usually more 
convenient to deal with p(x^y Xg) than /’(x^, Xg). 

In general, if E is any set in /^g, we have 

P((xi.x*)6£)= T 

In particular, the marginal c.d.f. Fi(xx) for any value of x^, say Xj, is 
given by 

(2.5.5) Fi(x;) = P(xi <x[)= 2 P(4*’. *2**). 

where is the set in /?g for which Xj < x[. Fg(xg) is similarly defined. 

The marginal c.d.f.’s Fi{xj) and F 2 (xg) are discrete one-dimensional 
c.d.f.’s, which have marginal p.f.’s pi(x^) and /^gCxg), respectively. 

The reader can readily verify the following statement: 
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2.5.2 If is a discrete random variable with pf p(x^y a?j), a necessary 
and sufficient condition for and to be independent is that 
Pip^Vf ^ 2 ) ~ * P2(®2)* 

Example. Examples of two-dimensional discrete random variables are 
plentiful in elementary probability theory. For instance, if denotes the 
number of aces and the number of kings occurring in a hand of 13 bridge cards, 
the mass points (af, a^), a = 1, 2,..., 25 of the random variable (aj^, are 
(0,0), (0,1), (1,0),..., (4,4). Under conditions of “perfect” shuffling; that is, 

assuming the possible hands are all assigned equal probabilities the p.f. of 
(a?!, a:*) is 



for (a?!, x^ = (0,0), (0,1),..., (4,4) and, of course, pixy^, x^ 
points in The marginal p.f. of ar^ is given by 


4 

/>i(^i) = 2 ^ 2 ) = 



0 for all other 


which, of course, is the p.f. of the random variable x^, the number of aces 
obtained. A similar expression holds for p^x^. Note that x^ and x, are not 
statistically independent since p(xi, 


Further cases of two-dimensional discrete random variables will arise 
in Chapter 6. 


(b) The Continuous Type 


In the case of a continuous two-dimensional random variable (a^, x,), 
there exists a Lebesgue-measurable function/(xj, x,) > 0 such that 


(2.5.6) 


f»i r*i 

F(*i. * 2 ) = 

J-oO, v-c 


fiVif Vs) dy^dy^ 


9*F’(x,, Xj) j 

for all (xi,x^eRa in which case ®**sts and 


(2.5.7) /(*!.*,) 

dx^ ox, 

at all points in except possibly for a set of probability 0. 

Conditions on/(a^, a^ under which (2.5.7) is valid are two-dimensional 
analogues of those stated for the case of a one-dimensional random 
variable; the details are left to the reader. 
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Now let US examine the concept of probability density for the two- 
dimensional random variable (x^, 

If we set up the ratio 


(2.5.8) 


Af.F(a!i, xg) 
« - * 0(^2 - 


Xy > Xy 


Xo > X. 


2 » 


where the expression in the numerator is defined in (2.4.4), it is clear that 
this ratio represents the average probability per unit area contained in 
If the limit of the ratio exists as xl —► x[ and xl -► xl^ this limit is 
^ 2 ) > Ihat is, a non-negative density at {x[, x*^. If at a given point 
/(^i* is given by (2.5.7) we shall say that the probability density of 
(a?!, x^ exists at that point. We shall call/(a:i, a^g) the probability density 
function (p.d.f.) of (ar^, x^, and /(ar^, x^ dx^ dx^ the probability element 
(p.e.) of (a?!, Xg). Summarizing, 


2.5.3 The c.d.f F{x^, X 2 ) of a continuous random variable (x^, x^ is 
uniquely determined by the p-df.fix^, x^ in accordance with (2.5.6). 
Conversely, /(ar^, x^ is determined by F{xi, x^ in accordance with 
(2.5.7) except possibly for a set of points in having probability 0. 

If E is any (Borel) set in R 2 , we have P{E) given by the following 
Lebesgue integral: 

(2.5.9) Piixi, X 2 ) e E) = J J/(yi, y^) dy^ dy^. 

E 

In particular, we have for the marginal c.d.f. of 

(2.5.10) Fiix[) = P(xi < *;) = j * I /(*/i, 2 / 2 ) dy^ dy^. 

When we consider the integral in (2.5.10) as an iterated integral, the 
function 

(2-5.11) fi{xi) = I* f{xi, 1 / 2 ) dy^ 

J -CO 

is called the marginal p.d.f. of x^. The marginal p.d.f. of arg is similarly 
defined. 

The following statement furnishes a useful criterion for statistical inde¬ 
pendence of two random variables x^ and arg and can be verified by the 
reader. 
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2.5,4 If (xj, ajg) is a continuous random variable having p.df a 

necessary and sufficient condition for and x^ to be independent is 
that 


^ 2 ) “* A(®i) *^ 2 (^ 2 )' 

Example. As a simple example of a two-dimensional continuous random 
variable suppose two numbers are independently “drawn” from the interval (0,1), 
all numbers being “equally likely” to be drawn. Let x^) be the (two-dimen¬ 
sional) random variable such that x^ and .^2 denote respectively the smaller and 
larger numbers drawn. We define the p.d.f. of (. 1 * 1 , x^) as follows: /(ajj, X 2 ) = 2 
inside the triangle having vertices (0,0), (1, 1), and (0, 1) in the x^x 2 plane, and 0 
at all other points in the plane. The reader will see that the marginal p.d.f. of is 

^ Cfji pi 

/i(*i) = /(•»!. 2 / 2 ) 2 dVi = 2(1 - x^, 

J — 00 Jxi 

and similarly that f2(^2) = 2.C2. 

Various special cases of more important two-dimensional continuous 
random variables will be discussed in Chapter 7. 


(c) The Mixed Type 

In a mixed random variable (x^, X 2 ) one of the components is discrete 
and the other is continuous. More precisely, if F{Xj^, x^ is the c.d.f. of 
(a^i, X 2 ), where x^ is discrete and Xg is continuous, and if a = 1,2 ,... 
are the mass points of x^ and if Pi{xi) is the p.f. of Xj, then there exist one¬ 
dimensional conditional p.d.f.’s /(xg | a = 1,2 ,... given by the 
formula: 


(2.5.12) I *<*>) = • f [F(*i“>, x^) - F(xi*) - 0, x^)-] 

such that 

(2.5.13) 


F(xi, *2) = Z pM 


Xi '^Xl 


1 *’) f f(y I * 1 “’) <iy- 

J — 00 


The marginal c.d.f.’s Fi{xi) and F^ix^ are given by the following 
formulas: 


(2.5.14) 
and 

(2.5.15) 


Fzix^ = 2 f*‘ fiy 14*’) dy. 

a V — 00 
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We therefore have the following statement: 

2.5.5 The c,df, F(x^^ of a mixed random variable where x^ 

is discrete and x^ is continuous is uniquely determined by the pairs 
[p{xf\f{x 2 ,1 a = 1 , 2 ,;.. and conversely. 

For purposes of interpretation, a mixed random variable (x^, ojg), where 
Xi is discrete with p.f. p(xj) and ajg is continuous with conditional p.d.f. 
/(^2 I ^i)> be viewed in the following way. First, the probability 1 is 
partitioned into pieces of magnitude pi(x^i^), a = 1 , 2 ,... and these 
pieces are placed at the mass points x^i\ a = 1 , 2 ,... respectively on the 
a?i-axis. Then these pieces of probability are continuously “smeared” 
along the vertical lines = x^^\ a = 1 , 2 ,... in such a way that the 
density at any point (x[^\ on the ath line is 1 

It will be noted that if the f{x^ | are identical for all a, then 

(2.5.16) /(^2|4*')=/2(^2), 

where f 2 {^^ is the p.d.f. of ajg, and F{x^^ x^ = F^{x^ • F^ix^ ', that is x^ 
and ajg are independent. Conversely, if ar^ and x^ are independent, then 
^ 2 ) * ' ^ 2 (^ 2 ) (2.5.12) reduces to (2.5.16). Hence 

2.5.6 If (a?!, x^ is a mixed random variable where x^ is discrete and x^ 
is continuous^ a necessary and sufficient condition for x^ and x^^ 
to be independent is that /(arg | a:j|“^) = f 2 {x^ for all a where f 2 {x^ 
is the marginal p.d.f of X 2 . 

Example. As stated earlier, examples of mixed random variables are more 
rare than those of the discrete or continuous type, but the following simple 
artificial example at this point might strengthen further the idea of a mixed 
random variable. Suppose a die is thrown, letting be the random variable 
denoting the number of dots occurring. Thenif x^ = x[^\ = 1,..., = 6 ), 

suppose numbers are drawn independently from a “uniform” distribution on 
the interval (0, 1), letting xg denote the largest of the numbers obtained. 
Then (a?i, Xg) is a mixed random variable with x^ discrete and having p.f. 
pi(x^i^) = J, xi^^ =s 1 ,..., = 6 , and arg is continuous, such that for 0 < a^g < 1 

/(** I ^ 

for = 1 ,..., a;[®> = 6 for 0 < ajg < 1, with /(xg | = 0 for all values of 

a?g outside (0, 1 ). One distribution of possible interest in this example is the 
marginal c.d.f. FgCajg), which is given by 

^*(»*) “ i<*s + *1 + • • • + a^). 

From this we see that the p.d.f. of a^g is 

/ 2 (a^ 2 ) =“ i(l + 2 x 2 -\ -+ 6 a:|). 

Other examples of mixed random variables occur in Problems 8.34 and 8.35 
at the end of Chapter 8 . 
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2.6 DISTRIBUTION FUNCTIONS OF Jt-DIMENSIONAL 
RANDOM VARIABLES 


(a) Gmeral Properties 


The reader will now see how the results of Sections 2.4 and 2.5 for two- 
dimensional random variables can be extended to the case of a A-dimen- 
sional random variable. It is sufficient to outline the extension only 
briefly. For a detailed analysis of general multidimensional distributions 
the reader is referred to von Neumann (1950). 

Suppose (* 1 ,... ,x^ is a A:-dimensional random variable whose 

probability space is (,R^, 38^, P). Let E^^ .be the set (—oo,..., 

— oo; * 1 ,.... a;^] in R^.. Then 

( 2 . 6 . 1 ) 

F{xl. = P(E^^. .^p) = Pi-CO < Xt< xl,i = I,., k); 

F(x^y,Xj^) is clearly a single-valued, real, and non-negative function of 
(^ 1 ,. . ., in Rj,. 

Now any interval 4 in Rj^ of the form (x[, .. ., .. ., belongs 

to since 


( 2 . 6 . 2 ) 4 = 












u 


U E, 






Furthermore the probability that (x^,..., x^) e can be found in the 
terms of Fix^,... ,x^) by extending (2.4.3) to the case of k variables. This 
gives 

(2.6.3) Piixi,...,x,)eI,) = F(xl...,x';;) ^ 

- [Fix[, x'i . x'i) + ■ ■ • 

-f- F(x^, .... **-i, *jj)] 1/ 

+ [F(*l. *2. . . . , 4') + • • • 

+ F(*l> • • • . *|fe-2. ^fc)] ■' 

+ * • * 

+ .... a:^). V 

It is convenient to denote the expression on the right by Aj^F(xi,..., a:^), 
the A:th difference of Fix^,... ,x^ over 4 . We then have 

(2.6.4) P((a:i,. .., a:,) e 4) = ^\Fixy, ...,*,)> 0. 

By argument similar to that used in establishing (2.4.5), it follows that 

(2.6.5) F(— 00 , Xj,..., Xj.) = • • • = F(xj, X 2 ,..., 00 ) ” 0. 

Also, we can extend the argument leading to (2.4.6) and show that 

( 2 . 6 . 6 ) F(+oo,...,-|-oo) = l. , 
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Furthermore, the A;-dimensional analogue of (2.4.7) can be verified, 
that is, 

(2.6.7) F(®j,..., X( + 0, ..., ajj) = f(xi ,..., ®j.), 

i = 1 ,..., fe. 

It can also be verified that the value of F(x ^,..., x^) is unchanged if 

any set of a:’s, say ..., are replaced by x,-^ + 0.*<, + 0 

respectively. 

Therefore, if (ar^.Xj^) is a A:-dimensional random variable having 

probability space (R^, P), there exists a function F(x ^,..., a;^.) defined 

at every point (x^,,Xj^) in P* by (2.6.1) and having properties (2.6.4) 
through (2.6.7). 

Conversely, if F(x^,,XjJ is defined by (2.6.1) and having properties 
(2.6.4) through (2.6.7) we obtain a probability space (P;^, P). 

Summarizing we have the ^-dimensional extension of 2.4.1 namely, 

2.6.1 The probability space (P*., P) of a k-dimensional random 
variable^ (ar^,. .., aj^.), uniquely determines a single-valued^ real, 
and non-negative function F{x^,.,, ,Xj^ defined by (2.6.1) at every 
point in P*. and having the following properties: 

(2.6.8) ia) A5/(xi,..., x») > 0 

(b) F(-oo, Xj,..., X*) = • • • = F(xi,.. ., X;fc_i, -oo) = 0 

(c) F(+ 00 ,..., + c») = 1 

id) F(xi,..., x<_i, x< + 0, x,+i. x^) = F(xi.x»), 

/ = 1,..., A:. 

Conversely, F(xi,..., x^ defined by Fix^ .X;t) = F(£',^^.,^)) 

and having these properties uniquely determines a probability space 

The function F(xi,..., x*) is called the distribution function (d.f.) or 
cumulative distribution function (c.d.f.) of the A:-dimensional random 
variable (xj,..., x^). F(xi,..., x*) is sometimes referred to as a k- 
variate c.d.f. 

It will be noted that the probability that x^ = x^,..., x* = Xj is 
obtained by taking the limit of the right-hand side of (2.6.3) as x* x\, 
i — . ,k, that is, 

(2.6.9) F(xi =’xi,..., X* = xi) = lim F(xi,.. ., x*). 

(b) Marginal Distributions 
The marginal c.d.f. of Xj, Ff^x^, is defined as 

(2.6.10) Fi(xi) = F(xi, -h oo,..., -1- 00 ), 
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the other marginal c.d.f.’s Fi(x,), i = 2,... ,k being similarly defined. 
More generally, the marginal c.d.f. of (arj,..., x^.^, < k, is defined by 

(2.6.11) Fi... .. ., x^) = F{xi, ...,x^^,+ CO,...,+ 00), 

a similar definition holding, of course, for any other subset of the com¬ 
ponents of , Xj^). 

(c) Independence of Two or More Vector Random Variables 

If (a?!, . . ., x^^) and , a;;^), where k + fcg, are vector 

random variables whose probability spaces are and 

(jRfcg* respectively, then these two vector random variables are 

said to be independent (see Section 1.6) if for every set Ej^ in Rj^ = 

X Rf^ of form x where and E^^^ are Borel sets in 
and Rf^ respectively, we have P{E^ = • P^^\Ej,^. 

As in the two-dimensional case, independence can be more usefully 
expressed in terms of c.d.f.’s, in accordance with the following statement 
which can be verified by the reader. 

2.6.2 If (ar^, . . . , a:^) is a random variable having c.d.f, F{x^, . . ., Xj^^ a 
necessary and sufficient condition for (a:^, . . . , Xj^^ and (a:;fc^+i, . . . 
to be independent is that 

(2.6.12) £(a7i, . . ., Xjf) = Fi - (^i» • • •» • •• • • • > 

where the two functions on the right are the marginal c.d.f*s of 
(^ 1 , . . . , ^k) cind (a;;t, fi,. . ., Xjf), respectively. 

The notion of independence can be extended in an obvious manner to 
the case of three or more mutually exclusive subsets of the components of 
(^i> • •, ^fc)- In particular, it should be observed that ... yXj^ are 
mutually independent if and only if 

(2.6.13) F{x^, . . ., a;^.) = F i(a;i) • • • Fjfxjf^. 

2.7 COMMON TYPES OF it-DIMENSIONAL 
RANDOM VARIABLES 

As in the two-dimensional cases, there are three common types of 
/:-dimensional random variables: discrete^ continuous^ and mixed random 
variables. 

In the discrete type, F(a;i,. .., ajj^) is a step function such that the right 
side of (2.6.9) is zero at all points in Rj^ except at a finite or countably 
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infinite number of points (a4“\ • • •, a = 1, 2 , ... which have no 
finite limit points in at which 

(2.7.1) P{xi = 4«>, ...,*»= x<«>) = . 4 .)) > 0 

and such that 

(2.7.2) 2p(4‘’.---.*i'’)=l- 

a 

The function p(x^^ • • • which has the value 0 at all points in Rj^ except 
(a4*\ • • . , a = 1,2,... at which the value is given by (2.6.9) and 
denoted by (2.7.1); p(x^, ..., aj^^) is the p.f. of ..., Xj^). In case there 
exists^only one value of a, say a = 1 , then p(x^i\ . .. , x^j^'^) = 1 and 
(a?!,..., a:*) is a degenerate random variable. Extensions of (2.5.3) and 
2e5.1 are straightforward. 

The reader will readily see how the marginal p.f. Pi{x^, or, more generally 
/>!... ^^(a?!,..., a;;^^) is defined, and how 2.6.2 can be restated in terms of 
p.f.’s. The definition of degeneracy can be extended, of course, to marginal 
distributions. It is important to note in particular that a;^,..., a;;^. are 
mutually independent if and only if 

(2.7.3) p(»i,..., a;*) = p^ipc^ • • • p^x^). 


In the case of a continuous A:-dimensional random variable (xj. x^ 

there exists a non-negative Lebesgue-measurable function /(ar^,..., a;;^), 
such that 

(2.7.4) F(xi.*») = • • • I KVr .dy* 

V — 00 •'“00 

for all (xi,..x^) where 


exists in R^, and 
(2.7.5) 


d*F(xi, ... ,gt) 

dxi • • • dx]^. 


dxi • 


■ » ‘"k) 

dxi, 


= /(Xl, ...,*i) 


at all points in /?* except possibly for a set of probability 0. 

Conditions under which (2.7.5) is valid are ^^-dimensional versions of 
these for the one- and two-dime isional cases and are left to the reader 
to formulate. 

The ^-dimensional extension of 2.5.3 is left to tl^ reader. 

The marginal p.d.f. //x^) of any component x^ of (a;^,..., a;;^, or the 
p.d.f. of any subset of the components, say /i.. (xi ,..., a;^^) are defined 
from the corresponding c.d.f.’s in an obvious manner. 

It will be seen that 2.<>.2 can be restated in terms of p.d.f.’s in the case of 
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continuous random variables. Furthermore, the components x^,... 
of the random variable (z ^,..., are mutually independent if and only 
if 

(2.7.6) /(xi,..., x») • • -/tCx^). 

If (xi,..., Xj^p is a A:i-dimensional discrete random variable and 
(^k,+i> • • • > ^k)> k = ki + k^, is a /r^-dimensional continuous random 

variable, then (x^.Xj^ is a mixed A:-dimensional random variable. 

Such a random variable is defined by conditional p.d.f.’s f(p^^+i . 

** 1 • • •» **,) given by the formula 


(2.7.7) /(x*.+i,. 


_<«) 
H • 




Pi--- J;i(*i*\ • • • » +1 ■ ■ ■ 


» *»)] 


where/»i...».j(xi,..., x^^) is the p.f. of (x^,..., X;^^) and where 

(2.7.8) A5j,».,F(x,.x*)= lim [AJi F(x,.x*)]; 

.** 1 -"**, 


A/JjF(xi,..., Xj) is the fc^th difference of F(xi,..., Xj) with respect to 
(xi,..., x*^) over the Ar^-dimensional interval (x[, ..., xj^^; x^,..., Xj^] 
where .x,^ are held fixed. 

The ic-dimensional analogues of 2.5.5 and 2.5.6 are straightforward 
extensions and are left as exercises for the reader. 


2.8 FUNCTIONS OF RANDOM VARIABLES 

(a) Functions of One Random Variable 

It will be recalled from Section 2.2 that a (one-dimensional) random 
variable x has a probability space (R^, ^i, P), or equivalently a c.d.f. 
Fix). We often have to deal with the probability theory of a measurable 
Junction* g(x) of the random variable x, where g(x) is real, single-valued 
and defined at every point x in except possibly for a set of probability 
0 , and where the set of points in Rj for which g(x) < y, for every real 
number y, also belongs to the class of Borel sets in R^. For example, 
if X is a random variable such elementary functions as polynomials in x, 
sin X, d*, etc., are obviously measurable.^ 

It follows from the fact that if (Rj, ^i, P) is the probability space of 
a random variable x, and from the definition of g(x) that for any real 
number y, the set of values of x for which g(x) < y is contained in 
and hence the probability assigned to this set is provided by (Ri, dii, P). 

* Sudi functions are also called Baire functions. 
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Let US denote the probability of the inequality g(x) < y, for a fixed y, 
by H(yy Then 

(2.8.1) H(y) = P(g(x) <y) = P(x e £,) 

where By denotes the set of points x (in R^) for which g(x) < y. 

It can be readily verified that H(y) has all the properties of a c.d.f. and 
hence we have: 


2.8.1 If X is a random variable and g(x) is a random variable denoted by y, 
then H{y) defined by (2.8.1) is the c.df. of y. 

Now suppose g(x) is strictly monotonic and has a continuous non¬ 
vanishing derivative for all x in some open interval A. Let y = g(x) and B 
be the image (interval) of A in the y space. Take x' e A and let y' = 

Then for some Ai/ > 0 there is a unique solution of the equation g{x) 
= + Ay which will be indicated by a; = g-'^ (y' + Ay). 

If the c.d.f. of x is F{x\ we have 

(2.8.2) /f(y' + Ay) - /f(y') = ±[Fig~\y' + Ay)) - F{g-\y'))l 


the plus sjgn holding if g{x) is monotonically increasing and the minus sign 
if g(a;) is monotonically decreasing. 

Dividing both sides of (2.8.2) by Ay we may write 


( 2 . 8 . 3 ) ^(y' + Ay) - H(y') ^ r f(g~ V 
Ay L s~Hy' 


+ Ay))- 


-Fjg-W 
- s-Hy') - 


g \y’ + Ay) - g \y') 
|' ±(^~Hy' + Ay) - g~^(y')) j 


If g(x) has a nonvanishing derivative at a; = x\ g~^(y) has a nonvanishing 
derivative at y = y\ Taking the limit of (2.8.3) as Ay -► 0, assuming 
F{x) has a derivative f{x) for all a; g ^4 and dropping dashes, we obtain 


^> =/(*)■ ^ 
dy dy 


where x is replaced by g~^(y) on the right. We use the absolute value sign 
to simplify the rule about using the + and — signs. Furthermore, 



(2.8.4) 
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Summapfzing: 

2.8.2 y Suppose x is a continuous random variable with p.d.f. /(a?), and 
y = g(^) strictly monotonic and has a continuous nonvanishing 
derivative in some open interval A. Let B be the image of A in the 
y space. If y is the random variable g{x), then y is a continuous 
random variable whose p.d.f h{y) exists in B and is given by 

dg~Hy)\ 


(2.8.5) 


hiy) =f(g-\y)) 


dy 


where g \y) is the solution of g{x) = y for x. Furthermore^ (2.8.4) 
holds. 


Sometimes we use the phraseology “the transformation x =: g^\y) 
carries the probability element f(x)dx into the probability element 
Ky) or more briefly, “/(:r) dx -> h{y) Jy” where h{y) is given by (2.8.5). 
Formula (2.8.5) is, of course, the familiar rule for changing the variable 
of integration in the integrand (a probability density function here) of an 
ordinary integral. 

Example. To illustrate the rule expressed by (2.8.5) suppose a; is a continuous 
random variable having p.d.f./(x) = 1/a for x in the interval (0, a) and fix) =* 0 
otherwise, and that we wish to find the p.d.f. of the random variable x^. In this 
C 3 isc,g(x) = x"^ Sindg~\y) = Hence, applying (2.8.5), 

na 


for y in (0, a”) and 0 otherwise, that is, for the transformation x 



a 


1 

a 


dy 


dy 


_L yl/n-l ^2/. 
na 




(b) Functions of Two Random Variables 

Suppose we have a two-dimensional random variable (x^, X 2 ). A func¬ 
tion g(x^, X 2 ) of x^ and x^ is referred to as a random variable if x^ 
is real, single-valued, and defined at every point of R 2 , the aj^aja-plane, 
except possibly for a set of probability 0, and if for every real y the set of 
points of R 2 for which gix^, X 2 ) < y belongs to the Borel class onsets Si 2 
in R 2 . If ^* 72 ) is a random variable, let us denote it by y \ fct the set 
of points in the x^X 2 plane for which gix^, x^ < y be denoted by Ey. 

Since the random variable is characterized by a probability space (R 2 » 
^ 2 > ^)» or alternatively by a c.d.f. F(xi, Xg), the probability that (x^, x^ e Ey 
is provided by (Rg* ^ 2 » (and, of course, also by F(xi, x^ for any real 
number y). Let H(y) be this probability. We have 

(2.8.6) H{y) = P(g{x,, Xg) < y) = P({x^, x^ e £,). 
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It can be verified that H(y) is a c.d.f., namely, the c.d.f. of the random 
variable y — g{x^, x^. 

Let us consider the vector function (gi(xi, x^, gz^x^, x^) of the random 
variable (ajj, x^, where gi(xi, x^) and gzix^, x^ are real, single-valued, 
and defined at every point of /Jj, except possibly for a set of probability 0, 
and where the set of points in the x^aj^-plane for which gtix^, x^ < y^, 
i = 1,2 belongs to for every pair of real numbers (yi, y^)- For a given 
yi and let this set of points be denoted by Then the probability 

that (*i, X 2 ) G is provided by the probability space (/? 2 > ^ 2 . F) 

of the random variable (y^, y^). Let this probability be H(yi, y^. Then 

(2.8.7) 

= F(g,(xi, Xj) < y„ i = 1, 2) = P((xi, X 2 ) G 

It can be verified that t/ 2 ) of the properties of a two- 

dimensional c.d.f. and hence: 

2.8.3 If{Xi, is a two-dimensional random variable and if (i/j, denotes 

the random variable X 2 \ ^ 2 (^ 1 * ^ 2 ))> 2 / 2 ) defined 

by (2.8.7) is the c.df of {y^, y^. 

The reader should note that the c.d.f. of one of the components of 
say 2 / 1 , is given by 

(2.8.8) Hi(yi) = P(gi{x^, x^ < y^) = P{{x^, x^ G Ey) 

where Ey^ is the set of points in the x^iCg-plane for which gi{x^, x^ < y^. 
The important point about is that although it is actually the marginal 
c.d.f. of yi, defined by letting -► +00 in the c.d.f. H{y^, y^, it can be 
obtained without first determining H{y^, y^. In fact, (2.8.8) shows that 
the introduction of the random variable 2/2 = ^ 2 ) is irrelevant if one 

is interested in 2/1 only. 

The components of a two-dimensional random variable {x^, x^ are 
linearly dependent if there exist real constants Cj and Cg, not both zero, such 
that the random variable CyX^ + Cga^g is a degenerate random variable. 
(Usually there is very little interest in the case in which one of the c’s 
is zero, for in this situation one of the components x^ or Xg is itself degener¬ 
ate.) If CiXi + CgXg is degenerate when neither q nor Cg is zero, we shall 
say that x^ and Xg are properly linearly dependent, which means, of course, 
that P{ciX^ + c ^2 = ^ 3 ) == where Cg is a constant^ and the line CiX^ 
+ c ^2 = ^3 is not parallel to either coordinate axis. If we rule out the 
case in which one or more of the components of {x^, ojg) is degenerate, 
then clearly the only type of linear dependence which will arise is proper 
linear dependence. If x^ and Xg are not linearly dependent,they are said 
to be linearly independent. 
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In the particular case where — arg is degenerate such that 

P(xi - ajg = 0) = 1, 

then and Xg are called equivalent random variables. If F{x^^ x^ is the 
c.d.f. of two equivalent random variables x^ and a:g, then Pi(a?i) and 
^ 2 (^ 2 ) identical. The converse of this statement is, of course, false. 

Two vector random variables are equivalent if their corresponding 
components are equivalent random variables. 

(c) Continuous Functions of Two Continuous Random Variables 

An important special case arises when is a continuous random 

variable, and when the transformation = gi(xi,X 2 ), i = 1,2 is one- 
to-one between the X 1 X 2 - and i/i^/g-planes. In this case we can, under 
certain regularity conditions, give an explicit expression for the p.d.f. of 
(Vv t/ 2 ) terms of the p.d.f. of arg) and the first partial derivatives of 
^ 2 ) ^ 2 (^ 1 * ^ 2 )- More precisely, we may make the following 

statement: 


2.8.4 Suppose {x^^ Xg) is a continuous random variable with p.d.f. /(a?i, a;g). 
F^t gi{x^, ^^ 2 )* ^ ~ ^ single-valued and have continuous first 

partial derivatives in some open region A in the x^X 2 -plane. Let = 
^<(^1. ^2). ' = 1.2, have a unique inverse x ^ = g^KVi, y ^, i— 1 , 2 
for all points in A. Let B be the image of A in the y^y^-plcme. Let 
J be the Jacobian defined by the determinant 


(2.8.9) 


J = 


dx^ 


i,j = 1, 2, 


having a non-zero value at all points in A. The p.d.f. h{yi, of 
(^ 1 . 2 / 2 ) (d iVv 2 / 2 ) e B d given by 


(2.8.10) h(yi, j/a) = f(xi, x^) • |7|, 

where, on the right of (2.8.10), (x^, x^ are to be replaced by 

?r%i.2^2).^2''%i>2^2). 

respectively. 

Furthermore, 

(2.8.11) J f{xi, x^ dxi dx 2 =j^f{xi, *2) | J | dyi dy 2 . 

Theorem 2.8.4 is merely a Statement, in terms of probability density 
functions, of the familiar theorem to be found in advanced calculus texts, 
on changing variables in a double integral. The reader interested in 
details of the proof may refer, for example, to Widder (1947). 
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Example. Suppose is a random variable having p.d.f. 


/(Xi, x^) =: 



Xj > 0, Xg > 0 

otherwise, 


and that we are required to find the p.d.f. of the random variable (x^ + Xg, xjx^). 
The transformation involved here is 


t/i — a?! + X2 


and its inverse is 


2/2 


Xi 


Xi 


X 2 


Vi 

1 +2/2 
1 + 2 / 2 * 


This transformation provides a one-to-one mapping between points in the first 
quadrant of the x^xg-plane and the first quadrant of the The absolute 

value of the Jacobian of the transformation for all points in the first quadrant is 

^ 2/1 

%i,2/2) (1 +^ 2 )^ 

Hence we have for the p.d.f. of ( 2 / 1 , y^ 


Kyi , 2/2) = 


\e-vi 


lo, 


Vi 


(1 + 2/2)" 


2/1 > 0 , 2/2 > 0 
otherwise. 


Incidentally, it should be noted here (see 2.5.4) that yi and 1/2 are independent 
random variables. 


(d) Functions of A:-Dimensional Random Variables 

The preceding results extend in a straightforward way to several functions 
of several random variables. If {x^, ..., is a fc-dimensional random 
variable, a function g{x^, ..., which we shall call y, is also a 
a random variable if it is real, single-valued, defined at all points in Rj^ 
except possibly for a set of probability 0, and if the set of points in Rj^ 
(the x-space) for which g(xi,..., < y, belongs to the Borel sets 

of Rj^ for all real values of y. If, for a given y, Ey denotes this set of points, 
then the c.d.f. of y is given by 

(2.8.12) H(y)^Pi{x^,..,,x,)EEy). 

The definition of linear dependence extends to the case of a ^-dimensional 
random variable (x^,..., We have linear dependence among x^,..., 
X* if c^Xi -b • • • + CjfeXjfc is a degenerate random variable for some con¬ 
stants Cl,..., Cfc not all zero. If none of the c< is zero, we have proper 
linear dependence. If x^,..., are not linearly dependent, they are said 
to be linearly independent. 
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A vector function (giix ^,..., aj^),..., ..., a:;^)) of a random 

variable (* 1 ,..., x^), which we shall denote by (yi ,..., is also a 
/ci-dimensionai random variable if each of the components gi(xi ,..., 
*ifc)> • • • > gkS^u •••.**) is real, single-valued, and defined at all points 
(Xj,.. ., Xj) in except possibly for a set of probability 0, and if the set 

of points .in for which ^^(x,,.. ., x*) < y^, / = 1,. .., * 1 , for 

any set of real numbers iji,..., y^^ belongs to Thus, the probability 
of the event E^y^ is provided by the probability space {R^, P)> 

or the c.d.f. F(x ^^. . . , of the random variable , a;^). The 

function //( 2 / 1 ,. . ., defined by 

(2.8.13) H(yi,. y^) = P{{x ^,..., x^) e . 


is the c.d.f. of (y ^,..., y^). 

Now suppose ky = k and (x„ ..., x^) is a continuous random variable 
with p.d.f./(x^,. . ., xj. In some open region A of the space of the x’s 
let Hi = gi{xi,... ,x^, 1 = 1 ,..., A:, have a unique inverse x, = 
gr^iVi, ■ I,-. ■,k, where the ^,(xj,. . ., X;^) possess continuous 

] • I 

first derivatives such that the Jacobian J = | | -/t 0 in A. Let the image 


Sy, 

of A in the space of (r/i,. . ., y^) be denoted by B. Then (y ^,. . . ,yt) 
is a continuous random variable having p.d.f. at a point (y^,.. ., y*) 
in B given by 

(2.8.14) /r(yi,. . ., y,) = f{x„ .. ., x,) • |/|, 


and furthermore, 

(2.8.15) 

. ^k) dxi ■ • ■ dx^ = \ f{xi, . .. , x») \J\ dyi - dy„. 

Ja Jb 

These, of course, are the ^-dimensional extensions of formulas (2.8.10) 
and (2.8.11). It is understood that x^ = , 2/*), / = 1,..., A:, 

that the a:’s are to be expressed in terms of the y’s on the right-hand side of 
(2.8.14) and (2.8.15). 


2.9 CONDITIONAL DISTRIBUTION FUNCTIONS 
(a) General Remarks 

In Section 1.9 conditional probability was defined for events in the 
general sample space R. Since we are primarily interested in events defined 
by random variables it is useful to specialize the ideas of Section l.IO for 
the case where events are described by random variables. Suppose a: is a 
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random variable with probability space (Ri, P) or equivalently, having 
c.d.f. F(x), Let G be any event in for which P(G) > 0, and the 
event x < x'. Let 


(2.9.1) 


Fix' I G) = 


r> G) 
P(G) 


The reader can verify that Fix | G) satisfies all of the conditions in (2.2.6) 
for a c.d.f.; it is called the conditional c.d.f. of x given xeG. Conversely, 
by 2.2.1 uniquely determines a conditional probability space 

(Pi, ^i,p):' 

In a similar manner suppose (xj,..., x^) is a A:-dimensional random 
variable having c.d.f. Fix ^,..., x^. We can define the conditional c.d.f. 
of (a^i,..., x^, given that (*i ,... ,x^)eG, where G is a Borel set in P* as 

(2.9.2) Fix,, ...,x,\G)^ — 


where P(G) > 0, and E^^ .is the A:-dimensional interval (—oo; a:]*. 

Actually, we shall be interested in (2.9.2) mainly for situations in which 
the event G consists of sections of the (* 1 ,..., a:i)-space obtained by 
holding one or more of the components of (* 1 ,... ,x^ fixed. This 
immediately leads us into difficulties in situations where both numerator 
and denominator of the right-hand side of (2.9.2) are zero, as, for example, 
in case ix ,,..., ar^) is a A:-dimensional continuous random variable with a 

p.d.f. fix, .Xi). Treatment of these difficulties is given in the following 

paragraphs. 


(b) Conditional Distribution Functions of Two-Dimensional Random 
Variables 


Let us consider the case of two random variables x,, x^, and let I„ 
the ^linder set in Pj for which xl< x,< x',, be such that P(I^ > 0. 
Then we have from (2.9.2) 


(2.9.3) 


Fix,, Xj I /i) — 


FiF^xl,9t) fi) 

P(/i) 


If F(xi, x^ is the c.d.f. of (xi, x^) and Pi(a:i) is the marginal c.d.f. of x, 
then Fix',, x^ \ may be written as 

F(^ I/.) - 

' * I C 17 17 1 . 


(2.9.4) (F,(:r»-F.(.5)^r>(-ij 

Note that Fix',, x, 1 1^, as a function of Xj, is a c.d.f. If the limit on the 
ri^t exists as xj -*• x',, let it be denoted by P(Xjj | Xj), that is, let 

(2.9.5) lim F(xJ, x^ | Z^) = F(x, | xJ). 
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If f (a? 2 1 x[) exists for every it can be verified that F(x2 1 a;J) considered 
as a function of X 2 is a c.d.f; that is, it has the basic properties listed in 
(2.2.6). It is called the conditional c.d.f. of ajg given x^ = x[, or dropping 
the dash we may say, for brevity, when there is no possibility of ambiguity, 
that F(x 2 [xj) is the c.d.f. of the conditional random variable 2^2 | 
^(^2 r^i) is a c.d.f. in which x^ essentially plays the role of a parameter; 
x-^ is sometimes called a fixed variable to emphasize that it is not a random 
variable in the definition of F{x 2 | x^. The quantity X 2 ] x^ may be 
regarded as a one-dimensional random variable whose c.d.f. F{x 2 | a^i) 
for a fixed value of say x[, is defined at all points in the aj^arg-plane for 
which x^ = x\ in such a way that, roughly speaking, F{x 2 | jrJ) is the 
amount of probability (or probability density) lying along the portion of 
the line x^ = x[ for which X 2 < expressed as a fraction of the pro-, 
bability (or probability density) lying along the entire line. 

The preceding discussion provides a rather elementary approach to 
conditional random variables and their c.d.f.’s which is adequate for nearly 
all distributions arising in mathematical statistics. A more general 
approach can be made by use of the Radon-Nikodym theorem l.lO.l. 

There are two important special types of conditional random variables 
for the case of two dimensions which deserve special attention. 


Type a. In this type the component x^ of the random variable (a?i, Xg) 
is discrete. Thus, in (2.9.2) for k assume x^ is a discrete random 

variable with mass points a = 1,2,- Let G be the set in Rg, 

the sample space of (x^, Xg), for which x^ = x[^\ Then 

P{G) = P(x = x</>) = pfxf^) > 0. 


Using (2.9.2) for /: = 2 we find at once 

n (xi = 


(2.9.6) FixiOK x,jxi = z'f) = 


P(x, = 



which we shall denote by FCxg | x[^^). The function ^(a:^ | x[^^) is a c.d.f., 
and we shall refer to it as the c.d.f. of the conditional random variable 
X 2 I x^^K For a Type A conditional random variable, we may therefore 
define F(a:j | xf^) directly from (2.9.2) for = 2 and with G equal to the 
set of points in F, for which x^ = xf\ Definition (2.9.5) also leads to the 
same result for Fix^ \ a:f >) by choosing x[ = in defining /j. 

Note that ar^ | xf^ is a random variable whose sample space is on the line 
in the x^Xg-plane for which x^ = xf\ The c.d.f. of x, | a^^' is Fix^ \ 
When there is no ambiguity we may drop the /? and refer to ^(x, | ^i) as 
the c.d.f. of the conditional random variable x, | x^. 
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There are two common cases of conditional random variables under 
Type A. The most common is that in which is discrete as well as a;i. 
In this case, it is evident that 


(2.9.7) 






where x^ is the p.f. of (ajj, x^ and piix^) is the marginal p.f. of 3 . 
It will be convenient to drop superscripts and let 


(2.9.8) p(x, I X,) = , 

in which case p(x 2 \ x^) is the p.f. of the conditional random variable 
X 2 1 x^. Note that X 2 | ajj will be a degenerate random variable unless 
there are at least two distinct mass points having the given x^ coordinate. 
Formula (2.9.8) can be rewritten as 

(2.9.9) p(x[, * 2 ) = p{x^ 1 ail) • piixi) 


which provides a method determining the p.f. of (aj^, X 2 ) in two steps, that 
is, by finding the p.f. of X 2 1 x^ and the p.f. of x^ and multiplying the two 
p.f.’s. The (marginal) p.f. of arg, which is often the objective in such a prob¬ 
lem, can then be determined in the usual way from p(x^, ajg). 


Examples. Suppose we wish to find the probability p{x 2 1 x^) that a hand of 
bridge, known to contain aces, has Xg kings. The probability p{xy^,x^ of 
getting x^ aces and X 2 kings in a hand is given by 


p(^i, 2^2) = 



Now p^{x^ 
is given by 


4 

“ X probability that the hand contains x^ aces and 

a ;,“0 


/>i(*i) = 



Therefore, applying (2.9.8), we have 


y 


/»(**i*i) 


a)-(.3-:-.) 

(yj 
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Now consider an example to illustrate (2.9.9). Suppose is a random variable 
denoting the number of dots appearing when a “true” die is thrown. If Xy * asj, 
let x[ coins be tossed, and let x^be a. random variable denoting the number of 
heads obtained. The interesting problem here is to determine /? 2 (^ 2 )» 
probability that the number of heads resulting from the entire experiment is x^. 

We have /'(^a I ®i) = and 

Hence /K^i.^a) = i|**j(i)**- 


To find pz(x^, we sum p{xy, with respect to Xy over x^.x^ + 1,...» 6. This 
gives: 

63 120 99 64 29 8 1 

WV 3g4.3g4 * 3g4 » 3g4,3g4,3g4,3g4 » 


for X 2 — 0, 2, 3, 4, 5, 6, respectively. 


The second, but less common, case of a Type A conditional random 
variable is that in which F(x 2 ] is absolutely continuous in X 2 , thus 
possessing a density function /(x^ | xf"^) such that 


(2.9,10) f(*.I<')-^r-I 


This follows, of course, at once from (2.9.2) for A: = 2 and for G taken 
as the set of points in R 2 for which Xy = xfK It can also be obtained from 
(2.9.5) by choosing x[ = xf\ 

Here X 2 | xf'^ is a continuous conditional random variable with p.d.f. 
/(^2 Note that when expressed in terms of F{xy, x^ the function 

f(x 2 is given by (2.5.12) with a = /?. 

Type b. In this type, Xy is a continuous random variable with p.d.f. 
fy(xy), and there are two useful cases. In the more common one (xy, ojg) 
is a continuous random variable with p.d.f./(a?!, Xg), and (2.9.4) becomes 


( 2 . 9 . 11 ) F(xiX2\ly)^ 


Cxi r®, 

•/-oo 


fi^i, a;,) dx^ dxi 


I Vi(*i) dx^ 


If /i(xi) and j * f(xi, x^ dx^ are continuous functions of x^ at x'^ with 

J— 00 

> 0. then taking limits as x^ -*■ x[, we obtain 


(2.9.12) 


F(*,| 


r*t 

/(*!• * 2 ) d !>!2 

*0 = ^ -. 

' /l(*D 
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The reader will note that this expression for Fixg j x[) for the continuous 
case is suggested by the analogous formula (2.9.7) for the discrete case. 
If we set 

(2.9.13) 

then /(z 2 1 is a p.d.f., namely, that of the conditional random 
variable Z 2 \x[. When there is no danger of confusion we may drop the 
dash from x[ and rewrite (2.9.13) as 

(2.9.14) /(^i» ^ 2 ) I ^ 1 ) ‘/i(^i)> 

which, of course, is the analogue of (2.9.9) for continuous random 
variables. 


Example. Suppose is a random variable denoting a number picked “at 
random” from the interval (0,1) (all numbers on the interval (0,1) are to be 
regarded “equally likely in the elementary geometric sense”), and let Xg be a 
random variable denoted by a number picked at random from the interval (x[, 1) 
where is the realized value of We wish to find the p.d.f. of X 2 , that is,/ 2 (a; 2 ). 
We have 


/i(«^i) 

Hence 


Therefore 



0 < a?! < 1 
otherwise 


/(ajg I Xj) 



Xi <X2 < I 

otherwise. 


/(«!, ^ 2 ) 



0 <x^ <X2 <1 

otherwise in /? 2 . 


/ir 

-logo -^^2), 

0 ^ "“*1 


0 < 3/2 < 1 


3ndf2(x^ 0 for ^2 outside (0,1). 


In the second, less frequent case of a Type B conditional random vari¬ 
able, is continuous, and X 2 is discrete, and the reader can verify that 
if fi(x^ is continuous at x[ with fi(x[) > 0 , and if F(xj^, and Fix^, X 2 — 0 ) 
possess derivatives with respect to x^ at x[, then application of ( 2 . 9 . 4 ) 
and (2.9.5) yields 

(2.9.15) f(x, I xO = y P*(*^> I 

*r 

where 

(2.9.16) 

and 

(2.9.17) ^ 2 ) * [^(* 1 * ^ 2 ) — ^2 — ®)]* 

ox^ 
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Note that | is similar in structure to p{x 2 1 x^) as given by 
(2.9.8) except that p^ix^ | x^) is constructed from probability densities 
rather than probabilities (at mass points). Roughly speaking, we may 
think of breaking up the probability 1 into pieces • • • 

and then “smearing” these pieces of probability continuously along the 
lines x^ = x^^\ = xi ^,... in accordance with the density functions 

(not p.d.f.’s)/*(xi, 4 “^)» • • •» respectively. 


(c) Extensions to /r-Dimensional Random Variables 


The conditional c.d.f. defined by (2.9.5) can be extended in a straight¬ 
forward manner to multidimensional random variables. Suppose 
(xi,..., xj is a random variable with c.d.f. F(xi ,..., Xj^). Let 
(^i» • • • » ^ici) the marginal c.d.f. of (ar^,... , x^^), < k = ki + ^ 2 - 

Let 4^ be the cylinder set in Rj, for which x! c x^c, x[, i 1,. . ., 
The projection of this cylinder set onto the sample space of (x ^,..., 
Xj^^\ is the ^i-dimensional interval (a ;^,,,, ,xl^\ arj,..., which we are 
also calling 4i* Let he the A:th difference of 

F^...kJ<x^y ..., Xj^^ over 4^ defined in accordance with (2.6.J) and (2.6.4). 
This kih difference is simply Pih) which is assumed >0. Now let us put 


(2.9.18) F{x^^ . . . , I 4^) — 

In particular, we have 




(«i.**) 


n/J 


n/».) 


(2.9.19) F{x[, . . ., a;* | 4,) = 


where the numerator on the right is the Arjth difference of F{xi,... ,x^ 

with respect to (a;,,. .., a;,^^) for fixed values of (x^^^i .**)• 

If the limit of the right-hand side of (2.9.19) exists as x\-*x[,..., 
**1 *i-,. that is, if 

(2.9.20) lim F{xi ,..., xj.^, .. ., ar^t | 4^) 

= . . . ,fci 

= Pi^ki + l9 • • • > I ^1> • • • > ^* 1 ) 

exists, then dropping dashes, F(xj^^^i, • • •, • • • y ^k) is called the 

c.d.f. of the conditional random variable (a?*.^^.!,...» a;;i. | aj^, ..., a;;^^). 
It can be verified that if it exists, F{xj^^^i ,..., | ^*^ 1 , •. • > has all of 
the properties (2.6.8) of a /: 2 "^i^®^sional c.d.f. If the limit in (2.9.20) 
does not exist, the more general approach discussed in Section 1.10 is 
required. 

In the case of a discrete random variable {x^ ..., ajjj.), if G in (2.9.2) 
is chosen as the section of R* for which = ajJ,..., Xj^^ = x[.^ so that 
P(G) > 0 then the conditional random variable (a:*,^+i, • • • y^k\^i9 • • • y 
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exists and has a p.f. with the following formula, which is an extension of 
(2.9.8): 


(2.9.21) p(*»,+i. 1 *1,. 



p(x[, . x^) 


If we drop dashes, p(xi,... ,x^) is the p.f. of (x^, ...,aj^) and 
..., is the marginal p.f. of (x^,..., x^^. Under the 
conditions we have stated it can also be shown that the c.d.f. of the 

conditional random variable (**^+ 1 , ...,*» | .ar^^) [having p.d.f. 

(2.9.21)] is given by (2.8.20). 

The reader can verify that if (aij,..., a;*) is discrete we have the following 
multiplication formula for p.f.’s at mass points of the space of (x,^,..., 
**): 


(2.9.22) 

p(Xi, . . . , X^ — p(x^ I *1, ... , Xjfc_i) ‘ I • • • > **- 2 ) ' ■ ■/’l(*l)' 


In case (ai^,.... a;;^) is a continuous random variable, we have, corre¬ 
sponding to (2.9.21), the following p.d.f. of 


(2.9.23) /(a;fc,+i,.. ., a:* ] aj^,..., a:*^) = 


f(Xi, . ..,x^) 

/i • • • *,(*i» • • • > %,) 


where the two functions on the right are p.d.f.’s as defined in Section 2.7. 
This p.d.f. may be derived from (2.9.20) by straightforward extension of the 
argument and assumptions used in establishing (2.9.12). 

Corresponding to (2.9.22) we have, of course, for the continuous case, 
the following multiplication formula for p.d.f.’s: 

(2.9.24) 

/(®i, •.., *») =/(*» I *i» •. •, :r»-i) ‘/x- I *i> • • •»*2-2) ■ ■ 


assuming that the p.d.f.’s of all indicated conditional random variables 

^st. 


(d) Conditional Distribution Functions in Case of Independence 

Consider first the random variable (x^, Xg). Let (7 be a Borel cylinder set 
in parallel to the such that P(G) > 0. There is no ambiguity 

if we refer to a point (*i, x^ of G by writing x^ e G. Now is the 

Cartesian product x E,,'^ where is the set in for which 
and is the set in for which Xj < x'^. Then 

Cl C = (£,£ n G) X 



Sec. 2.9 


DISTRIBUTION FUNCTIONS 


67 


and we have from (2.9.2) 

(2.9.25) F{x'i, xi I G) = - G) X £^,) 

P(C) 

If Xj and *2 ^re independent, (2.9.25) reduces to 


(2.9.26) 


F{x[, x^ G) = 


P(E,i n C) 


P(G) 


• P(E.:) 


and therefore we obtain the following result (dropping dashes): 

2.9.1 Tjf (xj, Xj) IJ a random variable with c.d.f. F{x^, Xj), such that x^ 
and Xj are independent, and if G is any (Borel) cylinder set in 
parallel to the x^-axis for which P(G) > 0, then 

(2.9.27) F{x,, F,(x, | G) • F^ix,). 

This means that the conditional probability that Xg belongs to any specified 
set, given that (a:^, x^) e G, where G is a cylinder set parallel to the ajg-axis, 
does not depend on G. In particular, suppose G is the set for which 
e Ii used in (2.9.3). Then if x^ and x^ are independent, it follows from 

2.4.2 that (2.9.4) reduces to 

(2.9.28) F(x;, X 2 1 /,) = F^ix^l 

In this case lim F(x[, Xj | f) exists except possibly for a set of values of x, 

with probability 0, and is equal to Hence, we obtain the following 

corollary of 2.9.1 : 

2.9.1a. Ifx, and X 2 are independent (hen, except possibly for a set of 
probability 0, 

(2.9.29) F(x 2 I xj) = Ffxf). 

In case x^ and Xg are independent it is clear that (2.9.8) reduces to 
(2.9.29a) p(x 2 ] x,) = pfx^ 

and (2.9.13) reduces to ♦ 

(2.9.296) /(Xg I X,) =/g(Xg). 

More generally 

2.9.2 If (xi, . , . ,Xj) is a random variable with c.df F{x^, . . ., a;;^), such 

that the random variables (x^, . . ., Xj^^) and . .., a;^) are 

independent, then if G is any Borel cylinder set in Rj^ defined by 
restricting values of x^, , Xf^^ to any set such that P{G) > 0, we have 

•(2.9.30) 

F{Xi,»,., Xj^\G) ^ F^,.. • • • > 1 ^) * +1 ..... 
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If G is the interval for which ..., e used in (2.9.19) and 
furthermore if {x ^,..., Xj^^) and (^*^+ 1 ,..., are independent, we have 

F(Xi, • • * » I ^ Fi ^ I + .... + 

(2.9.19) reduces to 

(2.9.31) F(x^, . . . , ^ Xj^ \ Ij^^) = • • • > ^k)- 

T^ing the limits as xl->x[,,.., x^^ -► x\^^ which exists almost every- 
wftre, we obtain as a corollary of 2.9.2, 


2.9.2a Ifi (a;^,...,: 
that (a?!,.. 

’ possibly for 

Xj) is a random variable with c,d.f F(x^,,,. ,Xj^ such 
• > »• • • > ^ife) independent, except 

a set of probability 0, we have 

(2.9.32) 

• ^1» • • • > ^fcj) — + . . 

, , k(^k^+l9 • • • > 

If (*i.X*) and (xj ,..., X;^ are independent, it follows from 

(2.9.32) that (2.9.21) and (2.9.23) reduce to 

(2.9.32a) />(Xfc^+i,. 

. ., 1 a:i,..., Xfc^) = ,. 

. , , 

and 



(2.9.32Z>) /(x»^+i,. 

respectively. 

II 

+ 

., • • • > ^fc) 


2.10 FINITE STOCHASTIC PROCESSES 


A fc-dimensional random variable (aj^, ^Xj^) is sometimes referred to as 
a finite stochastic process, particularly in applications where the components 
of (.Tj ,. , , ,x^ correspond to measurements on the outcomes of a succes¬ 
sion of physical operations in such a way, of course, that a /c-dimensional 
c.d.f. is determined. 


Example. For instance, if is a random variable denoting a number taken 
“at random” from the interval ( 0 , 1 ), all numbers being assumed “equally likely,” 
whereas is a number taken “at random” from {x^, 1 ) and so on for k numbers, 
we have a A:-dimensional random variable (x ^,..., xj^, whose p.d.f./(a;i, 
can be found by applying (2.9.24), that is. 


/ 




1 1 
(1 ““ ^jfc-i) (1 ^*- 2 ) 


1 

(1 


for 0 < a?! < • • • < 0 ?* < 1 , and f{x ^,..., a?*) “ 0 otherwise. We may then 

refer to (x^.a;;^) as a (finite) stochastic process for describing the results of a 

succession of “cuts” of the interval ( 0 , 1 ) in the manner described. 
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As he progresses through this book the reader will note many examples 
of finite stochastic processes. Infinite stochastic processes, that is, multi¬ 
dimensional random variables with infinitely many components, also 
occur at various places. They are defined and discussed in Chapter 4. 


PROBLEMS 


2.1 If F(ps) is the c.d.f. of a random variable a;, establish formulas (2.^7), 
(2.2.8), (2.2.9), and (2.2.10) by considering appropriate sequences of half-^n 
intervals and 1.4.5. 

2.2 By considering a suitable sequence of two-dimensional half-open 
intervals establish (2.4.9). 

2.3 Let Fjipc) and F 2 (x) be c.d.f.’s of a discrete and a continuous random 
variable respectively. If a and b are non-negative numbers whose sum is 1, show 
that 

aF^(x) + bF2(x) 


has all of the properties of a c.d.f. 

2.4 A mixed two-dimensional random variable (o?]^, 0 ^ 2 ) is such that is a 

discrete random variable with p.f. pix^) = = 1 , 2 , ... and x^ is a 

continuous random variable such that the conditional random variable X 2 1 
has p.d.f. Xi(\ on the interval (0,1). Show that the p.d.f. of the 

unconditional random variable Xg is 2(1 + Xg)”* on ( 0 , 1 ). 


2.5 A discrete random variable x^ has p.f. x^ = 1, 2 ,... where 

0 < p < \,q = \ — /?, whereas X 2 is a continuous random variable such that the 
p.d.f. of a?a I Xi is x^^i on (0,1) and 0 otherwise. Determine the unconditional 
p.d.f. of Xj; the p.f. of Xi\x2; the c.d.f. of x^; and the c.d.f. of Xj. 

2.6 If G(xi,x 2 ) = [1 - (1 + xi)-* - (1 + X 2 )-* -h (1 + xi + X 2 )-*], where 
k > 0,at any point (x^, Xj) in the first quadrant of the x^xa-plane, and 0 elsewhere^ 

' show that G(^i»^s) satisfies all conditions for the c.d.f. of a two-dimensional ^ 
^ continuous random variable (xi, x^ and find its p.d.f. 

2.7 If (x, y) is a pair of continuous random variables whose p.d.f. is /(x, y) 
for X > 0, 2 / > 0, and 0 elsewhere, show that 

(a) the p.d.f. of 11 = yjx is 


r 


xf{x, ux) dx. 


(b) the p.d.f. of V 


X + 2/ is 



f? — x) dx. 


(c) the p.d.f. of w =» xt/ is 
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2.8 Suppose the random variables (aj^, have p.d.f. at every 

point (a?!, of the a;ia; 2 -plane. If Vx and arc random variables related to x^ 
and x^ as follows: 

Xx * Vx cos i/a 
ara = Vx sin 

where the p.d.f. of yx is 2 / 1 ^ ~ for 2/1 > 0, and 0 for 2/1 < 0; and the p.d.f. of 2/2 
is 1 Kin) for 0 < 2/2 < 27r, and 0 otherwise, show that 2/1 and 2/2 are independent. 

2.9 A three-dimensional random variable (a^j, ajg, 0 : 3 ) has p.d.f. 6 in the 
tetrahedron having vertices ( 0 , 0 , 0 ), ( 0 , 0 , 1 ), ( 0 , 1 , 0 ), ( 1 , 0 , 0 ) and 0 outside. 
Find the c.d.f. and p.d.f. of: (a?!, x^); x^; x^ | x^; x^ + x^; and 4 - ajg + a; 3 . 

2.10 If (a;^, ajg, x^) are non-negative random variables having p.d.f. 

^-(jri+* 2 +* 8 ), find the p.d.f. of (w, a^g, x^ where w = 4- ajg + x^. From the 

p.d.f. of («, ajg, ajj) find the p.d.f. of 

(a) w, 

(b) u arg, 

(c) u a;g,a: 3 , 

(d) (ajg, x^ I u). 

2.11 If (a?!,..., a;,fc) are independent random variables having c.d.f. 
Ffajj,..,, x,c\ shows that in (2.6.4) 

• • •, ^fc) = 

where AF,(a;,) = F,K) - F,(x;), 

and Fi(Xi) is the marginal c.d.f. of x^, 

2.12 If (a?!,..., ajjj.) is a A:-dimensionaI random variable having a p.d.f. which 
is symmetric in a?!,..., Xj^, show that P(xx < Xg < • • < a;J = 1/A:! 

2.13 A A:-dimensional random variable (ajj,..., a;;t) is known to have a p.d.f. 
of form 

C(/< 4- xi 4- • • • + 

for a?! > 0,..., a?fc > 0 and 0 elsewhere, where A and B are positive. Determine C. 
Find the p.d.f. of the marginal distribution of (xx, ..., a;^), r < k, 

-c-^2.14 If (a ?!,,., ,Xj^) are independent random variables with identical 
continuous c.d.f.’s F(a:i),..., F{xj^\ and if 

u « min (a?!,..., a;^) 

V = max (a?!,..., Xj,\ 
show that the c.d.f. of ( 1 /, v) is 

[F(v)]^ vtF{v)-F{u)]K 

where g{u^ u) « 1 for w < i;, = 0 for 1 / > v. 

Find the marginal c.d.f.’s of u and of v. Also if F{x) has a derivative f(x) find 
the p.d.f.’s of (w, v), 1 /, and v. 

2.15 A jar has N chips numbered 1, 2,..., M A chip is drawn, its number 
denoted by a random variable a?,, and it is replaced. A second chip is drawn, its 
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number denoted by a random variable ajj, and it is replaced. And so on until k 
chips are drawn and replaced. Assigning equal probabilities to all N numbers at 
each step and mutual independence of ..., write down the c.d.f. of the 
^-dimensional random variable ..., If 1/ + 1 is the smallest integer 
which exceeds every number drawn find the c.d.f. and then the p.f. of y. 

2.16 (Continuation) Suppose the k (<^) chips arc drawn successively without 
replacement, the probability of drawing any chip remaining in the jar at any stage 
being assigned so as to be equal to that of drawing any other chip remaining at 

that stage. How many mass points does (x^ .have? Determine the p.f. 

of (a?i,...» xj^. Show that the p.f. of the marginal distribution of any set of r 
(r < k) of the a;’s is the same as that of any other set of r of the a^’s. 

2.17 If (a?!, 2/1), •., 2 /a:), k independent two-dimensional random 

variables with identical c.d.f.’s F(x^, , F(xj,, y^,) (and identical p.d.f.’s 

2/1),... yk)\ and if 

u = max (a?!,..., a:^) 

V = max (1/1...., yj,\ 

show that the p.d.f. of (w, v) is 

dF dF 

k(k - 1 )F^- 2 (m, v)—-—+ kF^-\u, v)fiu, V). 


2.18 (Continuation) 
of w is 



If w = max (x^ + 2 / 1 ,...»-h yj^ show that the p.d.f. 



y) dy dx 


-ifc-i 


I 


00 

f(x,w — a;) dx, 

- 00 


2.19 Show that if each of the random variables , a;* is independent of 
the remaining ones, then they are mutually independent. 

2.20 If F(x ^,..., is a jt-dimensional c.d.f. with marginal c.d.f.’s 
Fj(a*i) • • • Fj^xj^ show that 

F(a;„ [F,(a!,) • • • F»(x»)]i«. 



CHAPTER 3 


Mean Values and Moments of 
Random Variables 


3.1 INTRODUCTION 

In Section 1.8 we have defined, for a given probability space (R, 3S'P) 
the Lebesgue-Stieltjes integral of a random variable x{e) with respect to P 
over a set £ 6 

Let £'_ be a JBorel set in /?i, and let E be the set in R for which 
Me) e £'. Then if F(x) is the c.d.f. of x(e) we may write the Lebes gue- 
Stieltje s integral (1.8. 5) of a:(e) over £ in either of the two following .equal 

(3.1.1) I x{e) dPie) = \ X dFix). 

Jb Je’ 

Similarly, let El, be a Borel set in Rf^, and let £ be the set in R for which 
(xi(e ),..., x^{e)) e El. If £(xj,..., x*) is the c.d.f. of (xi(e),..., Xjt(e)), 
and if g(xi,..., x*) is measurable relative to [thus making g(xi{e ),..., 
Xj(e)) measurable relative to ^], we can then write 

(3.1.2) f g(xi(e),..., x*(e)) dP(e) = j g(xi .x*) d£(xi,..., x*). 

Je Je}, 

Suppose El in (3.1.2) is the set of points in for which g(xi .x*) 

6 £( where £( is a Borel set in Ri. Then if H(y) is the c.d.f. of g(x ^,..., x^ 
we can further write 

(3.1.3) I ^Xi ,..., x*) dF(xi,..., x*) s= I* y dH{y). 

JkI Jf\ 
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3.2 MEAN VALUE OF A RANDOM VARIABLE 

If in (3.1.1) we take as the set E', then the resulting integral defines 
the mean value of the random variable x, that is, we write 

(3.2.1) S{x) = j X dF{x). 

J — CO 

In case a; is a discrete random variable with p.f. p(x) as defined in 
Section 2.3(a), ef(a;) reduces to the following sum: 

(3.2.2) (^(x) = 

OL 

where the are the mass points of the random variable x. 

If X is a continuous random variable with p.d.f. f(x) we have 

(3.2.3) S'(x) = j X dF(x) = j xf(x) dx, 

J — 00 — CO 

More generally, if in (3.1.2) we take for El the entire space we obtain 

(3.2.4) A^(*i,. . ., **)) = j ^(*1, ...,»*) dF{xi ,. . ., ajj). 

jRk 

Furthermore, it follows from (3.1.3) that if El = R^, then El = R^ and we 
have 

(3.2.5) ^(g(xi,.. ., X,)) = f” y dH{y). 

J — CO 

If we denote g(x^, ...»a^^) by y, then the integral in (3.2.4) is simply 
S(y) and we have 

(3.2.6) ^(g(x ^,..., X,)) = ^(y). 

If (xj,..., x^) is a discrete random variable, then (3.2.4) reduces to a 
sum and if (x^,. .., x^) is continuous, (3.2.4) is a /c-dimensional integral 
over Rj^. 

If g(x^, ..., X;fc) is a random variable, which for the moment we may 
write as y, mean values when they exist have the following useful 
properties: 

3.2.1 If c is a constant 

S{cy) = cSly). 

3.2.2 <f(ayi + by^ = 

3.2.3 If m < y < M, then m < #(y) < M. 

3.2.4 Ifyx < yg, then < ^(yg). 

3.2.5 |«^(y)| < Alyl). 
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Suppose (xi, £ 2 ) is ^ random variable with c.d.f. F{x^, Xj) such that x^ 
and Xg are independent and consider the mean value of the product x^xg. 
It can be verified that 

(3.2.7) Six^x^) = f” xi dFyixi) • r xg dfg(xg) = Six^) • S{x^. 

J—ao w —00 

Thus we have the result that 

3.2.6 If (xj, Xg) is a random variable whose components x^ and Xg are 
independent, then 

^(x^Xg) = <^(®i) ■ ^(*g). 

A similar result holds, of course, for k mutually independent random 
variables. 


3.3 MOMENTS OF ONE^DIMENSIONAL 
RANDOM VARIABLES 

There are maay problems in mathematical statistics in which it is 
difficult, or at least not feasible, to determine completely the c.d.f. of a 
random variable. In such cases it is often possible to describe the dis¬ 
tribution of the random variable incompletely, although usefully, by 
moments and certain functions of moments of the random variable. 

Suppose X is a random variable having c.d.f. F(x). The mean value of x, 
as defined in (3.2.1), is usually denoted by <S'{x) or fi{x), that is 

(3.3.1) ^(x) = /t(x) = j X dF(x). 

J- 00 

The variance* of x is defined as the mean value of the random variable 
(z — /i{z))\ that is, 

(3.3.2) a\z) = S'ix - fji(x)f. 

The quantity a(x), the positive square root of a\x), is called the standard 
deviation of x. The ratio is called the coefficient of variation 

of X. 

If we use concepts and language of elementary mechanics, the mean of 
the random variable x can be interpreted as.the center of gravity in of the 
probability distribution of x.. The YariaiKc_of x can be interpreted as 
the moment of inertia of the same probability distribution about the center 
of gravity, and is an indication of the^mpunt by which the probability 
mass spreads (or concentrates) about the center of gravity. 

* Sometimes the notations ave (x) is used for ^’(a;), and var (x) is used for 
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When no ambiguity arises as to what random variable is involved, we 
shall write rather than fxix) and o rather than aix). 

Making use of (3.3.1) and the properties of mean values expressed in 

3.2.1 and 3.2.2, we obtain 

(3.3.3) = S{x^ — 2xfjt + = ^{a?) — fx^. 

If we take the mean value S{x ^ df where a is an arbitrary constant, 
we have 

(3.3.4) S(x ~ of = S\(x — ;/) + (^ — flf)]2 = or2 + (^ — of 
from which the following statement can be made: 

3.3.1 The value of the constant a which minimizes ^{x — a)^ is /x and the 
minimum value of S\x — of is 

A useful inequality for the amount of probability in the “tails” of a 
probability distribution is provided by Chebysheds (1867) inequality stated 
as follows: 

3.3.2 If X is a random variable having mean fx and variance d^ > 0, then 

(3.3.5) P{\x - ix\ > Xa) < i 
where X is any positive constant. 

To prove this statement we first cut the a:-axis Ri into three disjoint 
intervals: 

I ^ {--CO, fx — Xa], /' = (// — Xa, jx + Xa), /" = [// + Xa, + oo). 
We can write the variance of x as 

(3.3.6) a^=j{x- nf dF(x) +j^ (x - fj,f dF(x) +j^ ix - dF{x). 
Dropping the middle term on the right-hand side of (3.3.6), we have 

(3.3.7) (^>jix — fif dF(x) + j^ (x — (if dF{x). 

By replacing xhy — Xa in (x — /if in the first integral on the right and 
a; by /i + A<r in (* — nY in the second, the inequality is preserved and we 
have 

(3.3.8) ff* > a^X^P{\x — n\ > Xa) 

which is equivalent to inequality (3.3.5). 

The rth moment n'fix), where r is a positive integer, is defined if it exists 
as the mean value of the random variable that is 

(3.3.9) K(*) = 
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For oonveoience we shall occasionally let r ^ 0, in which case //^x) >> i. 
The question of whether /iKx) exists and is finite is simply a matter of 
whether the random variable z' is integrable over The rth central 
momai$nXx) is defined as the mean value of (z — nY, that is, 

(3.3.10) (ijix) - ^[(z - fiYl 

If there is no ambiguity about what random variable we are talking 
about, we shall denote /4(z) and /i/z) by j/f and respectively. 

It is clear that the li^r = \,1,... can be expressed as polynomials in 
the n'f and conversely. Noting that (Yy = fx, we have 

= 0 

A*s “ - /»* 

(3.3.11) — — + 2/U® 


^ / 

Mz^fh + 3/ia“ + i“® 


Note that ft^ — (^. 

The rth absolute moment t''(z) is defined as 

(3.3.13) K(x) = <f(|zn. 

We shall ordinarily write vj. We define the rth absolute central 

moment as 

(3.3.13a) v,(z) = <Y(\x - ^|0. 

More generally, we can similarly express the variance, and higher 
moments of any random variable g(z), in terms of the c.d.f. of z. For 
instance, making use of (3.3.1) and (3.3.2) the mean and variance of g(z) 
are defined as 

(3.3.14) /u(g(z)) - f[g(x)] and o*(^z)) - ^{(g(z))* - [>i(g(z))]*}. 

In certain kinds of problems involving discrete random variables, it is 
often convenient to determine the moments /i' by first ev&lv&ting factorial 
moments. If we let 


and conversely, 

(3.3.12) 


(3.3.15) 


z^'J ■■ z(z — 1) • • • (z — r + 1) 
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the rth factorial moment is defined as 

(3.3.16) 

and if no ambiguity arises we shall often write as ^(V,. Expressed in 
terms of the ordinary moments, we have 

/^ti] ~ 

(3.3.17) = ix'z — 

/^[ 3 ] — /^' 3 ~ 3/^2 + 


and conversely, 
(3.3.18) 


f^'i ~ Mil] 

/^2 — /*[ 2 ) + /^[ l ] 

/^3 = Ka Vt2] + M[l] 


Note that we have defined the various kinds of moments as “moments 
of a random variable x.” They are sometimes referred to as “moments of 
the probability distribution of x.” If Fix) is the c.d.f. of x, these various 
kinds of moments are sometimes referred to as “moments of Fix).” 


3.4 MOMENTS OF TWO-DIMENSIONAL 
RANDOM VARIABLES 

Suppose (xj, Xj) is a two-dimensional random variable with c.d.f. 
F(xi, Xj). The mean values of x^ and Xg are 

(3.4.1) fiixj) = <f(xj), ^(xg) = ^(xg), 

and are usually denoted by /*i and The variances are 

(3.4.2) <7®(Xj) as <?(Xi — fXi)\ <T*(X2) = (^(Xg — jUg)*, 

and are usually denoted by of and a^. 

There is an ambiguity in the notation of (3.3.10) and (3.4.1) when fix 
is written for fxixj) and jUg for /a(xg). This ambiguity, however, is unimpor¬ 
tant since /Uj » 0, and /t, s /[4 — in the sense of (3.3.10), and this will 
be cleaiiy indicated in any discussion in which these symbols may appear. 
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The covariance of and written cov {x^, x^, is defined as 

(3.4.3) cov (a?!, x^) = ^[(x^ - ju^Xx^ ~ JU 2 )], 
which can be simplified to 

(3.4.4) cov (x^y X 2 ) = ^(x-j^x^ — <^(x^ * S^(x^, 

Note that cov (x^y x^ = cov (ajg, ar^). 

If Xi and X 2 are independent, it follows from 3.2.6 that S[x^x^ = 
S(x^ • S{x^y and hence we obtain the following result: 

3.4.1 If x^ and x^ are independenty then 

(3.4.5) cov (xj, = 0. 


Remark. It should be noted that the condition cov = 0 does not 

imply that Xi and x^ are independent. For example, suppose (x^, Xg) is a 
random variable satisfying the functional relation x^ = cos with prob¬ 
ability 1, and the p.d.f. of x^ is given by 

lo, otherwise. 

It is found that cov (x^, X 2 ) = 0, yet x^ can be exactly (functionally) 
determined from x^ with probability 1. 


The correlation coefficient p(x^y Xg) between x^ and X 2 is defined as 


(3.4.6) 


^2) 


cov (Xj, Xg) 


and will be written as pi 2 or as p when no ambiguity arises as to which 
random variables are involved. If p(xi, Xg) = 0, x^ and Xg are said to be 
uncorrelated . 

Suppose we take the mean value of the random variable 

where / is a real constant. We have 


(3.4.7) t* + 2tp + 1 > 0. 

The condition for /* + 2rp + 1 > 0, for all real t, is that p® — 1 < 0. 
Therefore 


3.4.2 The correlation coefficient p satisfies the condition —1 < p < 1. 

If Xy and Xy are properly linearly dependent [see Section 2.8(6)] then 
/•(*, =*/?, + /3i*j) = 1, where /Sj 0, from which it follows that 



Sec. 3.5 MEAN VALUES AND MOMENTS OF RANDOM VARIABLES 79 

~ Po~ ^i*i) = 0 and that /> = +1 or — 1, depending on whether 
is a positive or a negative number. Conversely, if /» = ±1, then the 
random variable 

(3.4.8) T 

is degenerate with mean value equal to 0. This means that if ^ = ±1 

(3.4.9) (^-^) + = oj = 1. 

Therefore 


3.4.3 A necessary and sufficient condition for the nondegenerate random 
variables and having finite variances, to be {properly) linearly 
dependent is that = I. 

The moments and central moments are defined as follows: 

(3.4.10) 

(3.4.11) [(a^i - - iMa)’’*!- 

In this notation, the means of and x^ are /i|o and ju'qi, and the 
variances of x^ and x^ are ^20 and respectively. The covariance of 
Xi and X 2 is and the correlation coefficient is = PulV 

Absolute moments and factorial moments /i(V,iir,] are defined as 

(3.4.12) V, = ^(kxr* • kar*). 

(3.4.13) V. = ^(1*1 - 

(3.4.14) 

3.5 MOMENTS OF it-DlMENSIONAL RANDOM 
VARIABLES 


The extension of the foregoing definitions to A:-dimensional random 
variables is straightforward. Thus, if {x^,... ,x^ is a ^-dimensional 
random variable having c.d.f. F(xi, ...,**) the mean value of is given 
by (3.2.6) with g(*i ,... ,x^ = x^, which reduces to (3.2.5) where H{y) is 
the marginal c.d.f. of x^, or stated more briefly, 

(3.5.1) (It = S{Xi) = I *< dF(xi ,.... X*) = I Xi dFfXi). 

jRjt J-co 

The variance a? of x^ is similarly defined from the marginal c.d.f, of x^ 
in accordance with definition (3.3.2) and the covariance between and 
from the marginal c.d.f. F^f{x^, x^ in accordance with definition (3.4.3). 
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The set of all variances and covariances among the k components of the 
random variable (^i, ..., form & k x k symmetric matrix, called the 
covariance matrix 

(3.5.2) IlffJ, /,y= 

<T« being the variance of and <r,i, i ^ j, being the covariance between 
Xi and Xj. If there is any possibility of ambiguity as to what random 
variables are involved, we sometimes write the covariance matrix as 

It is convenient to say that the random variable has mean 

(Mv ••• 9 f^k) covariance matrix Ho'^H. 

We shall not only make^considerable use of the covariance matrix (3.5.2) 
but also of the inverse of the covariance matrix, namely, IlcXf which will 
be denoted by 

(3.5.3) ||cr»^-ll 

where cr*^ can be formally expressed as = (cofactor of in llcr,.^ll)/|(ri^|, 
and is Jhe determinant of the matrix ||o',J. Note that Ha^^H will 
exist only if 9^0. Note also that if has an inverse ||o^*^||, 

then |(r*^| = . Similar statements hold for the correlation matrix Hp.J 

Wii\ 

It will be recalled that the necessary and sufficient condition stated in 
3.4.3 for the nondegenerate random variables and ajg to be properly 
linearly dependent is that = 1. An equivalent necessary and sufficient 
condition can be stated as 

CTil 

= 0 . 

0 ’ 2i <^22 

It is useful to have a criterion for linear dependence in the case of a 
/:-dimensional random variable. As stated in Section 2.8(rf), the com¬ 
ponents of the random variable (x^,,., ,Xj^) are properly linearly depen¬ 
dent if there exists a set of constants q,..., C;^ all different from zeroj^ 
such that CiXi + * • • + Cj^Cf^ is a degenerate random variable. A criterion 
for proper linear dependence is provided by the following extension of 
Theorem 3.4.3. 

3.5.1 A necessary and sufficient condition for the components of the random 
variable {x^,.., yX^ to be properly linearly dependent is that the 
matrix || || be of rank fc — 1. 

First consider the sufficiency of the condition. 

Since || || is of rank /c — 1, |or^^| = 0 and all principal minors of ||(r^^| )are 
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positive, from which it follows that there exists a set of real constants c,-, 

k 

/ s= 1,..., A: all different from zero, such that 2 = 0 fory = 1. k. 

»=i 

Multiplying these equations by c, and summing with respect to j, we have 


(3.5.4) 

which may be written as 


2 = 0 


(3.5.5) 




However, this is the variance of the random variable 2 Since the 

k i=l 

variance is zero, then 2 is degenerate and, by definition, ... are 

i = l 

properly linearly dependent since none of the are zero. 

Now let us show that the condition is necessary. If ..., Xj^ are 
properly linearly dependent, then by definition, there exists a degenerate 

k 

random variable 2 where the are real constants all different from 

i=i 

zero. Therefore, 

(3.5.6) = 2 OiAC, = 0. 

\i = l / t,i = l 
it 

Now consider y)(ci,..., Cj^ = 2 ^ function of the c^. Sipce 

iyj — 1 

, Cjc) cannot be negative, it obviously has a minimum of zero at a 
point (in the space of the c’s) for which no is zero by hypothesis. But the 
values of the for which y)(Ci ,..., Cj^) has a minimum satisfy the equations 


(3.5.7) 


= 0, i = 1,..., fe. 


These equations reduce to 


(3.5.8) 


2 — 0, i — 1,..., /c 


Since the c’s are not zero we must have = 0. It is seen that the 
vanishing of any principal minor of |or,^| contradicts the hypothesis that 
none of the c’s are 0. Hence || || is of rank A: — 1, which completes the 

proof for 3.5.1. k 

If, for real Cj,..., Cj,, the quadratic form 2 vanishes (that is, has 

a minimum of zero only for Ci = • • • = Cj^ = 0, the quadratic form is 
said to be positive definite and its matrix is said to be a positive 
definite matrix. 
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A useful criterion of positive definiteness of a covariance matrix may be 
stated as follows: 

3u5.2 ...yX^isa k-dimensional random variable with no degenerate 

components, and having covariance matrix ||or^J, a necessary and 
sufficient condition for to be positive definite is that there exist 

no linear dependence among the components x^,..,, x^. 

The proof is left to the reader. 

The following property of quadratic forms will be useful in later sections. 

^ k k 

3.5.3 If 2 Is positive definite, the quadratic form 2 o^^c^c^ is also 

t,j-i U-i 

positive definite. 

The proof of this statement is straightforward and is therefore omitted. 
The various higher moments /“ri- - v -v - 

^ random variables (arj ,.,. ,Xj^ are defined by 
obvious extensions of (3.4.10), (3.4.11), (3.4.12), (3.4.13), and (3.4.14). 


3.6 MEANS, VARIANCES, AND COVARUNCES OF LINEAR 
FUNCTIONS OF RANDOM VARIABLES 

The most frequently occurring type of function of several random 
vafriables is a linear function. The following facts about the mean and 
variance of such a function are useful: 

3.6.1 (a?i,..., ®fc) is a k-dimensional random variable having mean 

(pi,,.,, pif) and covariance matrix ||<r^^|| then the mean and variance 
of the linear function 

(3.6.1) 

<=i 

where c^,,,, ,Cj^ are constants, are 

(3.6.2) 

i^l 

and 

(3.6.3) a*(L) = i a,,c,c,. 

<•<“1 

The proof is left to the reader. 

In case the are independent we have, by 3.4.1, = 0, i ^ j, and 

hence the following important corollary of 3.6.1. 
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3*6.1a If ..., are mcorrelated random variables with variances 
orf,..., (j| the variance is 


( 3 . 6 . 4 ) = 

More generally 

k 

3.6.2 If p = I 9 • • • i s, are linear functions of the random 

i = l 

variables referred to in 3.6.1, then the L, have means 


( 3 . 6 . 5 ) 


== 2 Civi^if 
i-1 


p = 1,... ,s 


and covariance matrix 


( 3 . 6 . 6 ) 


\\a(Lj,, Lj)ll — I 2 
Li-l 


If a?!,..., are uncorrelated random variables with variances af,..., 

* 


the covariance matrix || L^) || reduces to 


2 

i-1 


3.7 MEAN VALUES OF CONDITIONAL RANDOM 
VARIABLES 


(a) Case of Two Variables 


Suppose X 2 1 2 :^ is a conditional random variable whose c.d.f. is Fix^ | x^ 
as defined in Section 2,9(b), The mean value of ajg | x^, if it exists, is 
defined by 


(3.r.l) //(*a I * 1 ) = -^(*2 1 *i) = I *2 dFix^ I *1). 

J-ao 


The quantity p(x 2 | x^, considered as a function of x^^ is called the 
regression function of X 2 on x^. Graphically, it represents the locus of the 
center of gravity of the conditional random variable Xg | as a function 
of a?!. In particular, if 

( 3 . 7 . 2 ) p{x2 I i*^i) = ^0 + 

we have a linear regression function of on aj^, and and are called 
regression coefficients. 

More generally, the mean value of the conditional random variable 
^( 3 ^ 2 ) I is expressed by 

( 3 . 7 . 3 ) I *1) = f * ^(**) I *i)- 

v-00 
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In particular, the variance of | is given by 
(3.7.4) I - ^[(x, - I *i))* I 

The quantities | and <r^x, | are sometimes referred to by the 
terms conditional mean and conditional variance of x, given x^. ff^(x 2 1 Xj) 
is also sometimes called the residual variance of x^ on x^. 

The following statement, which is a corollary of 3.3.1, gives some useful 
information about the relationship between (fi{x^ | x^ and i4.x^ | x^); 

3.7.1 TjTxj \xiisa conditional random variable as defined in Section 2.9(b) 
and if u(xj is a real and single-valued function of x^, the value of u(xJ 
which minimizes ^[(x* — m(xi))* | x^] is given by u(xj) = ^(xj | Xj). 
Furthermore, the minimum of d'[(x^ — m(xi))® | xj] is | x^). 

If o‘(x 2 1X,) s 0 for all values of Xj at which x^ | x^ is defined, then it is 
clear that we have P[(x 2 = | x^) | ss 1. This means, of course, 

that if the value of the random variable x^ is known (or given) for an 
event, then the value of X 2 for that event is ft(x 2 \ x^) with probability 1. 

If we put g(x^ s= xl in (3.7.3) we obtain the rth moment of the con- ’ 
ditiona^random variable Xj | x^. The rth central moment and factorial 
moment of x, | x^ are similarly defined. 

For the actual determination of the mean value of a function g(xi, Xj) of 
a random variable (x^, Xj) in some of the simpler cases, it is sometimes 
convenient to proceed by iterated integration. In this case conditional 
random variables play an important role, which we express without proof 
as follows: 

3.7.2 Suppose (xj, x^isa random variable with c.d.f. F(xi, Xj). Let x^ | x^^ 
and x^ I Xj he conditional random variables in the sense of Section 2.9(h) 
with c.d.f.’s F(x 2 I Xj) and F(xi | Xj), respectively. Then, if g(x^, Xj) j 
is a random variable. 


(3-7.5) g(xi, X 2 ) dF(Xi, X,) =£ J g(xi, Xj) dF(x2 j x^j dFi(xi) 

or more briefly^ 

( 3 . 7 . 5 o) /(g(xi, X,)) » d\,^)[/(:^,(g(Xj, X2) I Xi)] = d’(*,)[d’(^)(g(xi, X2) I x^]. 


The reader can readily verify 3.7.2 for the simpler, but common types of 
conditional random variables referred to in Section 2.9. 

Extension of 3.7.1 and 3.7.2 to more general conditional random 
variables can be made by using the Radon-Nikodym theorem 1.10.1. 
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(b) Case of Several Variables 

If i*?* I ..., is a conditional random variable with c.d.f. 
I • • • > the sense of Section 2.9(c), the mean value of the 

conditional random variable g{xj) | , Xj^_^ is defined by 

(3.7.6) A?(iCfc) I a?!, . . . , = f g^Xj^) dF(xj, | arj, . . . , 

J- 00 

In particular, the mean and variance of Xj^\x^,,,,, Xj^_^ are defined by 

(3.7.7) I • • • j *fc-i) — ^ip'k I • • •» ^fc-i) 
and 

(3.7.8) u’^(a^fc I a?i,..., Xj^_^ = ^[(a?jj. f^(xjc | .. ., a:;^;_i))^ | a;i,..., Xj^^, 

The reader will note that 3.7.1 can be extended at once to the random 
variable ajj. [ ..., 

In general, if a:;^^^.!, . . ., ar^t 1 ^i» • • • > ^ is a /cg-dimensional 

conditional random variable with c.d.f. ..., a;^ | arj,, .., Xj^^^ the 

mean value of the random variable g{xj^ + 1 ,. • •, a;;^) | a?!,..., Xj^ is ^given 
by 

(3.7.9) ^[^(a:A^-Hi, . . . , ar^^) 1 ar^, . . . , 

~ I ^(^Ai + l» • • • » ^a) ^^(^Ai + 1> • • ' > ^A 1 ^1» • • • » ^Ai)‘ 

It is now evident how one defines such quantities as the covariance 
between two conditional random variables, say Xj^\x^, , , ., Xj ^2 
^A-i \^v •• 9 ^A- 2 > moments, etc. 

the quantity fxlxj, | ar^,..., Xj,_j), as a function of ar^,..., is called 
the regression function of Xj^ on ..., a;^_i. In the particular case where 

(3.7.10) ii(x^ I * 1 ,..., = /?o + H-!■ Pk-i^k-iy 

has a linear regression function on Xj. 

Finally, the reader should note that 3.7.2 can be extended so that we 
have as a generalization of (3.7.5), 



., Xfc) dF(xi,..., x») 


= f r f dF[x^^ 

• dFi...^fx^, ..., Xj,), 


+i» 


* 1 ,. . 
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or written as a completely iterated integral 
(3.7.12) I* g(xj, dFixi, ...,*») 

jRt 

Coo Too 

= • • • ^(*1.*») dF{x^ I , x»-i) 

•/—00 •/ —00 

• I a?!,. . . , • • • dFi(x^, 


There are, of course, k\ possible orders of iteration. 

We point out that formulas (3.7.6) through (3.7.12) can be given 
meaning under more general conditions on F(x^,. ., ,x,^ than those 
imposed in Section 2.9(c), by use of the Radon-Nikodym theorem. 


(c) The Correlation Ratio 

The variance a\x 2 ] x^ of the conditional random variable x^ | Xj^ 
provides some information as to how well the Xg component of a sample 
point in R 2 can be determined when we are given the value of the Xj^ 
component of the sample point. If a\x 2 | a:i) = 0, the determination 
holds with probability 1. If a%X 2 | a?i) 0, some notion of how good 
the determination is can be obtained by taking the ratio [ 0 ^X 2 | 
where, of course, is assumed ^0, However, in general, this ratio 

depends on x^, A more useful criterion is sometimes obtained by taking 
the mean value of this ratio with respect to x^. Denoting this mean value 
by rjl.i, we have 


(3.7.13) 
where 

(3.7.14) 


„2 _ I * 1 )] 

V 21 -"“5;;—;- 

(f [<T*(xa I *1)] = I* a®(x2 I Xi) dFi(xi). 
J —00 


The quantity called the correlation ratio of Xg on x^. It is evident 

that 0 < ?y|.i < 1. We have = 0 only when o\x 2 [ ^i) = 0, that is, 
only when Plix^ = 1 ^ 1 )) I = k At the other extreme = 1 

only when x^ and X 2 are uncorrelated. 

The multiple correlation ratio is defined as 


(3.7.15) 

where 




(Ar-l) 


i Xi,.... Xfi)] 


(3.7.16) ^[o*(x*| 



*»-i) . 
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As in the case ofthe correlation ratio we have 0 < < !• 

The value 0 occurs only if P[(a:* = | *1 .| * 1 ,..., = 1. 

The value 1 occurs only if x^ and {^x, •.., a;x^i) are uncorrelated. 


3.8 LEAST SQUARES LINEAR REGRESSION 


(a) Case of Two Variables 

Suppose we consider only linear functions of form + ^x*i for the 
function u{x^ referred to in 3.7.1 and determine the value of ^0 
for which Six^ — Po~ is a minimum. As usual, we assume that 
neither x^ nor *2 is degenerate, that is, that of and of are both positive. 
Denoting this function by y(/3o, /Sj), we may write it as follows: 

(3.8.1) y(^o> ^ 1 ) ~ <^[(*2 — i^a) — (^0 — ~ fii(Ph Pi)l* 


where fi^ = S’ix^) and ^ The values of and /3i which minimize 

i®i) ^ii he given by the equations 


(3.8.2) 



^ = 0. 

dp. 


Using the notation of Section 3.4, and simplifying these equations, we 
obtain 


(3.8.3) Po-f^2 + Pif^i = 0 

pOiOg - /?xof = 0. 


Denoting the solution by /3J, /?f we find 


(3.8.4) 


Therefore: 



3.8.1 The linear function m(*x) which minimizes ^(x^ — u(x^f is given by 


(3.8.5) 


u(Xl) = /i2 + P — (*1 - ^ 1 ) 


The line whose equation is 


(3.8.6) 


*a = Ps + /> — (*i - i“i) 


or, written more symmetrically, 
(3.8.7) 


(»2 -i“2) _ ^ (a^i - Pi) 

Oa ff. 


is called the least squares regression line of x^ on x,. 
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It can be verified that: 

3.8.2 The minimum value of 'which occurs for the 

values of /9o ond given by (3.8.4), and which is denoted by is 
given by 

(3.8.8) = 

The quantity al.iis called the least squares residual variance of on 
It is zero if and only if p = ±1, that is, if and only if x^ and x^ (both 
assumed nondegenerate) are properly linearly dependent, as pointed out 
in 3.4.3. 

By interchanging x^ and x^ one obtains the least squares regression line 
of x^ on x^. 

(3.8.9) 

(Ti 0*2 

Since neither x-^ nor x^ is degenerate, it is clear that the two regression lines 
(3.8.7) and (3.8.9) will coincide if and only if p = ±1. 

A question which arises here is whether there are special two-dimensional 
probability distributions which have the property that the least squares 
regression functions are identical with the actual regression functions. 
The answer is in the affirmative, and one of the most important distri¬ 
butions in mathematical statistics which has this property is the two- 
dimensional normal or Gaussian distribution which will be discussed in 
Section 7.3. This distribution, as we shall see, also has the property that 
^*(^2 1 a?i) does not depend on x^ and, in fact, is equal to 0 %.^. 

A criterion for indicating how effectively the least squares regression line 
can be used for determining the value of x^ of an event when x^ is given is 
to compare the variance of x^^ about the least squares regression function 
of Xg on x^ with the variance of Xg, ignoring x^, that is, to compare (Tg.i 
with al. The ratio of these two quantities which we write as follows: 

(3.8.10) = 

is called the linear correlation ratio. If x^ is linearly related to x^^ that is, 
if x^ and x^ are random variables which are properly linearly dependent, 
then p^ =s 1 and ~ 0* Conversely, if p* = 1 (and ^2 i(i) = 0), 

and x^ are linearly dependent. At the other extreme, if x-^ and x^ are 
uncorrelated, then p = 0 and ^ contains no information 

for determining x^. 

(b) Case of Several Variables 
Now consider linear functions of the form 

(3.8.11) Pq + Pyx^ H-h 
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and let us determine • • •» fih-i so as to minimize the quantity 

(3.8.12) Pi,..., = ^[xj, — Po — PiXi - 

As in the two-dimensional case, it will be convenient to write (3.8.12) as 

(3.8.13) — ^m,,) — (Pq — fi^ + Pi/ii Pk-iMk-d 

~ Pii^i ~ /^i) — • • • — ^*_i(*jfc_i — /*jfe_i)]®. 

The values of the p's which minimize (3.8.13) are given by the solution 
of the equations 

(3.8.14) |j = 0> a = 0,l,...,fc-1. 

^pa 

The first equation drp/d^Q = 0 reduces to 

(3.8.15) /^o ““ + APi + • * • + -Pk-i/^k-i = 0* 

Making use of this result in the remaining k — 1 equations in (3.8.14), 
we find that they reduce to 

^ 1^11 + ’ * • + Pk-l^lk-l = ^Ik 

(3.8.16) . 

^l^k-1,1 + • • • + fik-l^k-l,k~l = ^k-l,k 

where ||o',^||, i,j = 1,is the covariance matrix of the components 
of the random variable (a;^,..., a;,^.) as defined in Section 3.5. Hence, it is 
evident that: 


3.8.3 If |<y,^| ^0, p,q = . yk - ly andif the values of p^y , 

Pk-i ^hich minimize y) are denoted by , jSf,..., P*-! 
given by 


(3.8.17) 


Po ^ f^k ^ PlM'l Pk-iPk-1 

p = 1,..., k — 1, 

Q-1 V 


rp — 2. P — 1, . . . , K — 1, ^ H^ 

«“1 y i ■ ' 

wAere ||cr^|l is the inverse of the covariance matrix p, q— 


It will be remembered from 3.5.1 that Iff,,] # 0 implies that Xi,..., x^^i 
are not linearly dependent. 

Finally, we may state that the least squares linear regression hyperplane 
of x^ on Xi,..., Xi^i has as its equation 




P»1 


(3.8.18) 
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Equation (3.8.18) can also be written in more easily remembered 
determinantal form as 

(3.8.19) 

^*- 1.1 ' ’ * 

(a?! — fA-j) • • • 

as the reader will see by expanding the determinant with respect to the 
bottom row. 

Now consider the problem of finding the value of the minimum of 
V(i®o> • • •»with respect to the j^’s. Substituting the values of the /S*’s 
from (3.8.17) into (3.8.13) and denoting the minimum value of y; by 
o*-i 2 -(i-i) wehave 

(3.8.20) 

^*•12-••(*-« ~ “ A**) ~ ^?(®l ~ /<l) — • • • — 

= O’** “ 2 2 ' 

J»*l VtQ-1 

Substituting the value of /S* from (3.8.17) into the last two terms of 
(3^8.20), we have 

(3.8.21) 2 ^pkP* = 2 ^So^pk^qk 

3»«1 P,« = l 

(3.8.22) 2 =22 = 2 

p,Q=l p,a = lr,a-l r,«=l 

Ij-l 

since where is the Kronecker delta, which has the 

value 1 if 9 = jr and 0 if ^ 9 ^: s. The extreme right member of (3.8.22) is, 
of course, the same as the right side of (3.8.21). Therefore, we have, 

(3.8.23) ** ^ ^^)^pk^Qk‘ 

p,«=i 

But 

(3.8.24) - ‘f 

p.«-i kj 

where i,J ^ I,.. .,k and p,q^\ . k — \ as will be seen by per¬ 

forming a bordered expansion, [see Bdcher (1907), for example] of the 
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determinant Icr^^l by the kxh. row and A:th column. Therefore, we finally 
obtain the following result: 

3.8.4 The minimum value of y)(^Q, ..., denoted by 

is given by 

(3.8.25) = 

The quantity or|.i 2 . (ib-i) is called the least squares residual variance of 
Xj^ on oji,..., it is the variance of Xj^ around the least squares regres¬ 
sion plane whose equation is (3.8.18). 

The correlation coefficient between the random variable Xj^ and the least 

ifc-i 

squares regression function (^j> i^p) is called the multiple 

p=i 

correlation coefficient between Xj^ and {x^^ ..., x^_^. It is denoted by 
Pk'i 2 '-{k-\) is expressed in terms of the elements of ||cr^.^|| by the 
following formula 

(3.8.26) = 

^kkl^PQi 


To establish this formula we determine the variances of Xj^ and of 

k-l 

Pic + ^Pp i^p Pp) covariance between them. We have 

= ^kk 

“b 2 “ 2 ^pqPpPq * ' 

L p = l J p,«=l 

r k-l “I fc~i 

COV X^, /Mfc + 2 = X 

L p-i -I p=i 

Applying the definition (3.4.6) of the correlation coefficient, and using 
(3.8.21) and (3.8.22) and positive square roots, we obtain 

(3.8.27) Pk n- -{k-i) —X 

^ p.a*! 

Making use of (3.8.24) in (3.8.27), we obtain formula (3.8.26). 

From (3.8.25) and (3.8.26) it is evident that 

(3.8.28) ■ 12 • ■ • (t-l) “ ^kkO- Pk- 12-•• (*-!))• 

The /inear correlation ratio is defined by 


d. 


(.3.8.29) %. 12 •••(»-!),(£) — “1 Pi-is-••(»-!)• 

<tkk 

As with the linear correlation ratio for two random variables, of. i,... (»_i) 



92 


MATHEI4ATICAL STATISTICS 


will be zero if and only if if o®iy if fi*® 

probability contained in the regression plane of on (x^,, x^^ is 
unity. 

PROBLEMS 


3.1 If X is a random variable whose first absolute central moment exists 
show that the mean /< of x is finite, and that for A > 0 

P(lx _ ^1 < Av,) > 1 - i . 

3.2 If ^(x) is a measurable function of a random variable x show that 

P(l^(^)l >^)< for A > 0. 

3.3 Show that if the first r moments ..., (and central moments 
Ml ,..., y^r) of a random variable x exist, then 

and /'r = 

3.4 If X is any discrete random variable whose mass points are 0,1, 2,..., r, 
show that all factorial moments higher than are zero. 

3.5 If the moments Mzr, M 2 r+i, M 2 r-h 2 of a random variable exist, show that 

(M2r+l)^ ^ M2rM2r+2’ 

3.6 Prove 3.5.2 and 3.5.3. 


3.7 If (x^, • • • is a A:-dimensional random variable having means 
(Ml, • • • ,Mfc) and (positive definite) covariance matrix Ha,.,!! with inverse 
show that 

(7«(a;< - MiX^i - Mi) > 1 - ^ . 

3.8 Prove 3.6.1. 



3.9 If (a?i..., Xfi) are independent random variables having zero means and 
unit variances show that 

<i. 


3.10 If a; is a random variable with mean ^ and variance and has c.d.f. 
F(x) show that 


F(x) 




1 



'* - 


) 



1 


‘ + 1 

f T 


\^-m) 


ifx</i 


if « > /I, 


[see Cramer (1946)]. 
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3.11 If the 2rth moments of two random variables x and y are finite, show 
that the 2rth moment of a; + v is also finite. 

3.12 If a; is a random variable having mean and variance cr® and a p.d.f. 
with absolute maximum at c show that 

P(\x - cl > AV<t» + 0* - c)*) < , 

a result due to Gauss. 


3.13. 


(Continuation) Show that 

P{\x — /4| > A<y) < 


4(1 + (5*) 
9[A - \6\Y 


where (5 — c)lo and A > |(5|. [Cramer (1946)]. 


3.14 If Xi,..., are independent random variables having means all equal 
to fi and variances all equal to show by Chebyshev’s inequality that for any 
S >0 


limPi 

n —>00 


Xi + • • • + 

n 




1 . 


3.15 If (a?!,..., xj^ is a ^:-dimensional random variable such that the 
correlation coefficient between each pair of components is p, show that 

3.16 Suppose (a^i,..., x^, 2 / 1 ,, yj is an (m + /i)-dimensional random 

variable such that the variances of all components are unity, cov (a;,-, a?,) = 
cov (y^, y^) = P 2 , and cov (a;,, y^) = p^. If « = + * • • + x^, and v « 

yi + * * * + yn, show that the correlation coefficient between u and v is 


p(w, v) 


V mn p 3 

Vl 4- (m — l)pi • Vl + (/I - l)p2 


3.17 If a?!,..., Xp, yj,.,., y^, Zi,..., Zr are random variables having unit 

variances and zero covariances, show that the correlation coefficient between 
u and V where w = aj^ + * * * 4- + yi H- * • * + y,,, t7~a;i + *‘*+a:p + 

«! + • • • + z,. is given by 

p(m, r) = —- ^ - . 

^(p + y)(p + r) 

3.18 Suppose (a?!,..., a;„) are random variables such that ar^ and the con¬ 
ditional random variables ajg | ..., a;^ | Xn_-^ are independent. If 

^(xj) = //, <f(a;| j a;^_i) == f »= 2,. . ,, /I 

and if 

^(ajj — y)® ^[(aJ^ ““ a>|_j)® | aj^^j] =* cr®, ^ » 2, . . . , /i 

show that the unconditional mean and variance of Xn are /a and na\ respectively. 

3.19 If a; is a random variable whose first 2k moments yi,..., /^ 2 k exist, show 
that the matrix ||y,+, |i, /,y, ** 1,..., Ar, is positive definite. 

3.20 If a; is a non-negative random variable, show that ^(l/x) > l/^(a;). 
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3.21 Suppose (asi, ..., is an m-dimensional random variable r^resenting 

scores of a student (taken *^at random*’ from grade G) on m questions of an 
examination in subject A and (yi,... > Vn) is an /f-dimensional random variable 
representing his scores on an examination in subject B, Let Tj * + • • • + 

and r 2 “ Vi H-+ yn be the total “scores” on the examinations in A and 

respectively. Let af and af ^ pj ^ be the average of the variances of the a;*s and 
the average of the covariances of ail pairs of a;’s, respectively. Let a\ .n. "I, nPS.n 
be the corresponding quantities for the y’s. Let ^ tli© average of 

the covariances of ally pairs. If, as m, /i oo, we have -> a\ ,cr| ^ <t|, 

pi.m -*Pi<Pt,m -*p»,p».v,.n —#), where tff, <t|,/>i, Pi, p, are alipositive, sfiow that 

lim p(r„7V)=-^. 

in,n-*oo P1P2 

(Hence, if T and T* are scores of the student on two very long examinations in 
the same subject (A or B\ piT^ T*) ^ 1). 

3.22 Suppose X 2 , x^ is a three-dimensional random variable having (finite) 

covariance matrix ||<t<, 1|, and correlation matrix ||p^,||, /, J = 1, 2, 3. Let 
PSi + ^ squares regression line of X 2 on x^, and 

least squares regression line of x^ on x^^. Let y 2 = ^2 
^8 ** Pit Pi^v The correlation coefficient between y 2 andyg denoted by 
P 22.1 is called the partial correlation coefficient between X 2 and X 2 with x^ held 
constant. Show that 


^ P12P19 

^(.i - mi - p ! s )’ 


3.23 Generalizing the preceding problem, let (x ^,..., ^ah-i) be a (k + 1)- 
dimensional random variable with finite covariance matrix ||aiy|| and correlation 
matrix ||p<^||, i,y - 1 ,. .., k. 

Let pgj^ -h Pfjpsi + • • • + Pl^i be the least squares regression “plane” of 

** on .and pg + • * • + the least squares 

regression “plane” of Xj^^ on , xj^^. 

Let yi *■ — pg^ — Piipc^ Pg^x 

and y 2 * xj^^ — Ptk+i — ^ ifc+i®i — • • • — Pt-i k+i^k-v 

The partial correlation coefficient Pfe,k 4 .i.i 8 ...(k-i) between x^ andx^^ with 
• ,a;;^i/te/(c/coiuraitr, is the ordinal correlation coefficient betweenyiandy 2 . 
Show that 


Pa.ft+i- 18 * ••(»-!) 




where p, q mm k, k I is the minor one obtains by deleting the pth row 
and fth column of tte determinant 


1 P18 • • • Phh^i 

Pn ^ ’ P8,fc+i 


1 Pl5.*4*l 
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3M If (%,..., is a A:-dimensional random variable such that the cor¬ 
relation coefficient between each pair of components is p, show that the multiple 
correlation coefficient between any component and the remaining components is 

Vl +(jfc _2V' 

3.25 In the preceding problem suppose the correlation coefficient between 
and each of the components ^ 2 ^... is whereas the correlation coefficient 
between each pair of the components (xg, ..., is p. Show that Pip > 0 and 
that the multiple correlation coefficient between and • • •»is 

V(A: - Dp! 

Vl +(A: -2)p' 

3.26 In Problem 3.25 suppose the correlation coefficient between x^ and 

is pf, r = 2^... ,k while X 2 ,... ^Xj^ are mutually independen t. Show that th e 
multiple correlation coefficient between x^ and • • •» is Vp| + • • • + pj. 

3.27 Reduce to simplest forms the equations of the least squares linear 
regression hyperplanes of x^ on iCg,..., xj^ in Problems 3.25 and 3.26. 

3.28 Verify that the equations given by (3.8.18) and (3.8.19) are equivalent. 



CHAPTER 4 


Sequences of Random Variables 


4.1 DEFINinON OF A STOCHASTIC PROCESS 

An important class of problems in mathematical statistics is the deter¬ 
mination of limiting distribution functions of certain functions of n 
random variables as n-> oo. More precisely, if is 

an /i-dimensional random variable and gni^v ^ > ^n) is a function 

of (a?!,..., a; J which is itself a random variable, the problem is to deter¬ 
mine the limiting c.d.f. of , a^n) n oo, or at least certain 

properties of the c.d.f., if such a c.d.f. exists. Thus, it will be convenient 
to deal with random variables having infinitely many components. Such 
random variables are called stochastic processes. A stochastic process 
with a countably infinite number of components is often referred to as 
a sequence of random variables. 

More precisely stated, a stochastic process is a family of random variables 
{x^loLe A} where the range A may be an interval on the real axis, or a 
sequence of points on the real axis such as a sequence of integers, or 
even a more general set of points, such that every finite collection of 
components ..., x^J is a set of random variables with a specified 
c.d.f. ..., x^J. The c.d.f.’s of the collection of all finite 

sets of random variables must be consistent, of course, in the sense that 
the c.d.f. of any finite set of random variables, say ..., must be 
identically the same c.d.f. as the marginal c.d.f. of this set obtained from 
the c.d.f. of any finite set of random variables which contains ..., 
as a subset. 

4.2 PROBABILITY MEASURE FOR A STOCHASTIC PROCESS 

Since we shall deal mostly with stochastic processes (a;^, a^s,...) with a 
countably infinite number of components, there will be no ambiguity if 

96 
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we denote by the product space x x • • * where B!i^ • • • 
are (one-dimensional) sample spaces of , respectively. Then 

any countably in&iite sequence of real numbers (fej, ftg* • • 0 is a point in B^y 
and, conversely, any point in B^ is 2 l sequence of real numbers. Let b 
denote such a point. Thus, every realization of a stochastic process 
(a?!, ajg,...) specified by the values of a countably infinite number of 
components is a sample point in B^, We may refer to B„ as the sample 
space of such a stochastic process. 

Consider any finite set of coordinates . .., b^^J of b; it represents 

a point b' in the w-dimensional Euclidean space B^^^^ . X • • • 

X The point b' is the projection of b in B^ into Bj^^.which 

we may write as 

(4.2.1) y = 

The set of all points Fin B^ which project into a given set F' in Bj^^^ ••••“»") 
is called a cylinder set and may be written as 

(4.2.2) £ = 

If £' is a Borel set in the cylinder set £ in corresponding to 

£' (the preimage of F' in B^) is called a Borel cylinder set. The reader 

should note that F' is simply the projection of F in B^ into . 

If we extend the definition of a Borel cylinder set to the case where m = oo, 
it is noted that B^ itself is a Borel cylinder set. Furthermore if, for a 
finite w, £ is a Borel cylinder set, so is E, 

Now suppose F' and F" are Borel sets in .and BS^^ . 

and let F^ and F^ be corresponding cylinder sets in Consider the set 
U jEg in This is a cylinder set in corresponding to a (presently 

to be defined) set of points F^ in .where yi,.. ., y* is the set 

of distinct integers among the integers aj,..., Pi, •••, Pn- * 

and B!^^ .are then marginal sample spaces of BSj^^ and 

£* = JS:' U Fl, where £' is the cylinder set in B^j'^ .corresponding 

to F' in .and F^ is the cylinder set in BS^^ .corresponding 

to F*" in ..The set Fi n F^, is the cylinder set inB^ corresponding 

to F^ n Fl in .It should be noted that F^ and F 2 are disjoint 

if and only if F'^ n F^ = (]). 

If E' and F” are Borel sets in and .respectively, 

whose preimages in BSj'^ .are and £j, then U is also a 

Borel set in B^j^^ .and since £1 U E^ is the corresponding cylinder set 

in jR^ this latter set is therefore a Borel cylinder set. Hence 

4.2.1 The class of Borel cylinder sets in B^ is a Boolean field ^ of sets. 

However, we are interested primarily in the Borel extension of this field 
which we shall call That is, the Borel field is defined as the smallest 
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Borel class of sets in Hoo which contains the Boolean field 9^. is called 

the class of Borel sets of and includes all Borel cylinder sets in R^. 

Now suppose £' is a Borel set in the space * and let its 
preimage in jR^ be £. To assign a probability to the Borel cylinder set E 
we adopt the usual rule, namely, 

(4.2.3) = 

where probability associated with E* as computed from 

the c.d.f. of ..., where ..., are the indicated components 
from the stochastic process (a?!, arg,...). The question arises as to how 
probabilities are assigned to sets in which are not Borel cylinder sets. 
This question is answered by the following extension theorem due to 
Kolmogorov (1933a) : 

4.2.2 Let (a;^^,..., x^^^ be any finite collection of components from a 

stochastic process («!, a^g,...) and let . be the sample 

space of {x ^^^... x^J. Then if E is the Borel cylinder set in R^ 

corresponding to any Borel set E' in . let the probability 

associated with E be assigned in accordance with (4.2.3). Then there 
exists a unique probability measure on ^the Borel sets of R^, 
whose restriction to the cylinder sets in R^ is the set function defined 
by (4.2.3). 

The proof of this theorem is omitted. The reader who is interested in 
the proof is referred to Doob (1953) and Kolmogorov (1933a). 

In setting up a stochastic process with a countably infinite number of 
components as a mathematical model in an application, there is often a 
physical process of some sort, just as with a finite stochastic process, 
which “generates” the sequence of components x^, x^y... in such a way 
that the conditions of 4.2.2 are assumed to be satisfied. 

Examples. If a ‘"true” die is rolled repeatedly, we can set up a very simple 
stochastic process ^ 2 > * * *) where is a random variable denoting the 
number of dots obtained on the ath throw, whereas (x ^^^..., has as its 
p.f. the function p{x ^^^..., x^^ =» 1/6’* at each of the 6’* mass points of the 
r-dimensional random variable ..., for any choice of a^,... a,, and 
any finite r. 

If we let 07^ be a random variable having a rectangular distribution on (0,1), 

a random variable having a rectangular distribution on (a;^, 1), and in general 
acgc a random variable having a rectangular distribution on (a;<x_i, l),a » 1 , 2 ,..., 
then {x^y 0 ^ 2 ,...) is an exampleof animportantkind of a stochastic process called 
a Markov chain of order 1. Note that for any a the p.d.f. of theconditional random 
variable | aj«_i namely, 1/(1 — «ac-i) is exactly the same as the p.d.f. of the 
conditional random variable x^^ | xp^y ..., where Piy.,-yPr Is any set of 
r < a — 2 of the integers 1, 2,..., a — 2. A Markov chain of order A: is a 
stochastic process (x^y 0 ^ 2 ,...) which has the property that for any a the con¬ 
ditional random variable x^^ | Xq^.x* ...»has the same distribution as the 
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conditional random variableI ..., where Pi,... ,^,i8 

any set of r < a — ^ of the integers 1,..., a — ifc — 1. For discussion of the 
gmeral theory of Markov chains or Markov processes the reader is referred to 
the books by Doob (1953), Feller (1957) Loive (1955). 

When we have a stochastic process (x^, X 2 ,...) and occasion to discuss 
the probabilities associated with Borel sets in the (Euclidean) spaces of 
finite coUections of x’s, we can always consider the preimages of these sets 
in R„ and hence we can always talk about sets in and their associated 
probabilities. For instance, instead of talking about the (Borel) set in the 
for which \x„ — x„\ <c we can talk about the (Borel cylinder) 
set in R„ for which |x„ — x„| < c. The probabilities assigned to the two 
sets are equal by definition. In some situations it is usually a convenience 
to be able to refer to preimage events in /la, rather than to events in 
specific finite dimensional spaces corresponding to various finite collec¬ 
tions of x’s from (x^* • • •)> Ra> is & fixed sample space in which events 

regarding (x^, Xj,...) occur, whereas the sample space of (x^.x„) 

changes with n. There are, however, other reasons than convenience in 
considering R„. Indeed, as will be seen later, it is only in that certain 
problems and theorems can even be formulated. 

4.3 CONVERGENCE IN PROBABILITY 

(a) Some Criteria for ConvergmKe in Probability 

Suppose (x, Xi, X 2 ,...) is a stochastic process such that for arbitrary 
e > 0 

(4.3.1) lim P(lx„ — xl > c) = 0. 

n-*oo 

Then (x^, Xj,...) is said to converge stochastically, or to converge in 
probability, to the random variable x, and we sometimes denote this type 
of cof vergence briefly by writing 

(4. .la) p lim x„ = x. 

n-*oo 

If X is a degenerate random variable such that P(x == Xg) » 1, then the 
stochastic process (x^, Xj,...) converges in probability to the constant Xg. 

One of the simplest and most important examples of the convergence of 
a stochastic process to a constant is the wetik law of large numbers stated 
as follows: 

4.3.1 Let (Xj, Xf,...) be a sequence of independent random variables having 

0 means and variances of, <4,- Let = of + • • • -1- oj and 

lim e2/«* 0. Let (*i H— ‘ + *nV”‘ stochutic 

ll-*00 

process (% ...) converges in prob<d>ility to 0. 
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To prove this we note that /(f J = 0 and = <;?/«*. Hence it 
follows by Chebyshev’s inequality (13.5) that for an^bitrary c > 0 

< «) > 'I ^ • 

Thus, if c^jr? -► 0 as n -> oo, we have lim P(|^nl < «) = which con¬ 
cludes the argument. 

Actually, 4.3.1 holds if the covariance between each pair of components 
of (a?!, ajg,...) is zero. 

If the c.d.f.’s of the components of ajg,...) are Fi(x), F 2 (x), ..., 
and if lim F^ix) = F(x) at every point of continuity of a c.d.f. F(x), then 

n-^oo 

Oh, X 2 ,. ..) converges in distribution to F{x). 

Several theorems about convergence in probability and convergence in 
distribution follow. They are useful for later chapters. 

One of the simplest criteria for convergence in probability can be stated 
as follows: 

4.3.2 If {x, arj, ajg,.. .) is a stochastic process^ then (xj^, arg,...) converges 


(4.3,2) 


in probability to the random variable x^ if 
I lim ^(x^ — x)^ = 0. 


It follows from Chebyshev’s inequality stated in 3.3.2 that 


(4.3.3) 


P(\X„ - X| > 6) < 


- xf 


and since (4.3.2) is assumed, we have lim P(\Xn — a;| > e) < 0 which 

n-^oo 

implies (4.3.1). If {x, a?!, ajg,...) is a stochastic process which satisfies 
(4.3.2) with ^(x^ < 4 - 00 , « = 1, 2, ..., and <^(x^) < + 00 , then (ajj, a? 2 ,. . .) 
is said to converge in the mean to x. This type of convergence is sometimes 
denoted briefly by writing 

(4.3.2a) l.i.m. a?„ s= x. 


433 If (x, x\ * 1 , ajj ,, is a stochastic process such that (x^, a? 2 ,...) 
converges in probability to each of the random variables x and x\ 
then X and x* are equivalent random variables. 

To prove this we first note that 

(4.3.4) p(l* - *'| > ^) = /’(l(af„ - *') - (*„ - *)! > 
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Now let E, E, E be the sets of points in the sample space R„ of 

(x, x', * 1 , * 2 ,...) for which !(*„ - *') - (x„ - *)| > 1 - *1 > 

I N 2N 

and \x„ — x’\> — , respectively. Then it will be seen that £<=(£' u £*). 

Hence P(E) < P{E') + £(£"), that is. 


(4.3.5) 





l»n - *1 > 




By letting n —> oo both members on the right have limits equal to zero by 
hypothesis. Therefore 


p(|a, - .1 > i) = 0. 


Denoting the set in for which 1^ — ~ by G^, we have Gi ^ 

G 2 ^ • • • and lim G^ = G, where G is the set of all points in for 

N—*’CO 

which X ^ x\ Therefore, by 1.4.6 for the case of sets in we have 
(4.3.6) P{x x') = P(Gi U Ga U • • •) < nC?i) + PiG^) + • • * . 


But P(Gi) = PiG^ = • * • = 0. Therefore, P{x x') = 0, and hence x 
and x' are equivalent random variables [see Section 2.8(i>)]. 

Another useful result is that convergence in probability implies con¬ 
vergence in distribution, or more precisely stated: 

4.3.4 Suppose {x, x^, x ^,. ..) is a stochastic process with components 
having c.d.f.’s F{x), Fi{x), F^ix ),..., respectively. ...) con¬ 

verges in probability to the random variable x, then the sequence of 
c.d.f.’s Fi(x), FgCx), ... converges to F{x) at every point of continuity 
of F(x). 

As usual, let be the sample space of the stochastic process (x, Xi, 
* 2 . • • •)• Suppose F(x) is continuous at a; = a:Q and let x' be a constant 
such that *'< Xfl. Now consider the three events for which: x<x'; < 

and |x„ — x| > (x^ — x') in The first event is contained in the 
union of the second and third. Therefore we have 


(4.3.7) Fix') = P(x < x') < P(x„ <x^ + £(|x„ - x| > (x^ - x')). 

Now take limits as « -»• oo. Since (x^, Xj,...) converges in probability to 
the random variable x, we have lim Pi\x„ — x| > (xg — x')) = 0. There¬ 
fore, 

(4.3.8) 


F(x’) < lim inf F„(xg). 
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Similarly, by taking x" > asj, we find 

(4.3.9) ¥{x”) > lim sup 

fl-^oo 

and hence 

(4.3.10) F(x') < lim inf F„{xo) < lim sup F„(a:o) < F(x"). 

n-*oo n-*oo 

But since F(x) is continuous at a: = x^ we have lim F(a:') = lim F(a:") = 
F(x^. Therefore, we have 

(4.3.11) lim F„(xo) = F(a;o) 

n-»oo 

which completes the argument for 4.3.4. 

(b) Convergence of Functions of Components in Stochastic Processes 

If (x, Xi, X 2 , ...) is a stochastic process such that (x^, ajg,...) converges 
in probability to the random variable x we sometimes need to know what 
conditions will insure the convergence in probability of g(^ 2 X • • •) 
to g(x). We shall consider several theorems relating to this and similar 
questions which will be used in later chapters. 

'4.3.5 Suppose (x, x^^ x ^^...) is a stochastic process such that (xj, Xg,...) 
converges in probability to the random variable x. Let g(x) be a 
continuous function of x on Ri. Then 

(4.3.12) p lim g(x„) = g(a:). 

n-*co 

If X is a degenerate random variable, namely, a constant c, then (4.3.12) 
simply states that (g(xi), ^(xj),...) converges in probability to the constant 
g(c)- 

To prove 4.3.5 we note that g(x) is uniformly continuous on any closed 
interval, say [—A/, M], Since x is a random variable we can, for an 
arbitrary e > 0, choose M so that 

(4.3.13) P(|x| > M) < -. 

2 

For such a choice of e and M there exists a d(e, M) such that if 


(i) 

1*1 < M 

(ii) 

|x„ - x| < d(e, M) 

(iii) 

If (*«) - f (*)l < 8. 


If we denote the sets in for which (i), (ii), and (iii) hold by £i, £^, and 
£s, respectively, then it is 4 pen that £3 <= {E^ u £^, from which we have 
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P(£a) < P(£i) + P{E^, that is, 

(4.3.14) P(|f(a:„) - j^(a:)| > e) < P(|x„ _ a:| > ^(e, JJ/)) 4 . i>(|a:| > j»/). 

But there is an n(e, d, M) such that for n > n{e, d, M) we have 

(4.3.15) P(|a;„-*|> 6(e,Af))<-. 

2 

Since M has been chosen so as to satisfy (4.3.13), we finally obtain 

(4.3.16) P(\gix:^-g{x)\> B)<B 

for n > n{e, d, A/), which is equivalent to (4.3.12), thus establishing 4.3.5. 

It should be noted that 4.3.5 can be stated in more general form by 
requiring that g(x) be continuous on a closed interval / where F(x g /) s= 1. 
Since this would involve only minor modifications of the argument it is 
left to the reader. 

Sometimes we have to deal with convergence problems involving 
stochastic processes whose components are vectors. For instance, if 
(x, y; Xj, i/ii Xg, 2 / 2 ;. ..) is such a stochastic process, we might be con¬ 
cerned with conditions under which (g(i*?i, ^i), ^( 2 : 2 , ^ 2 )^ • • •) converges in 
probability to g(x, t/). In this case is the sample space of (x, y; x^, 

^ 2 ^ y2\ • • •)• 

If (z; Xi, yii X 2 » y 2 y • • •) is a stochastic process such that for an 
arbitrary e > 0 

lim P{\x„ - yj > e) = 0, 

n-*co 

while one of the sequences (ajj, x ^,...) or (y^, yj,...) converges in 
probability to the random variable z then it can be readily verified that 
the other sequence also converges in probability to the random variable z. 
In such a situation it is convenient to say that (xj, Xj,...) and (yj, y, ...) 
converge in probability together to the random variable z. Similarly, if the 
above limit holds and one of the two sequences converges in distribution 
to P(x), we shall say that both sequences converge in distribution together 
to F(x). 

We may state the extension of 4.3.5 to the case of several sequences of 
random variables as follows: 

4.3.6 Suppose (x*”,..., x<*); x<i*>,..., x(j*>; 4*>. ...) is a k- 

dimensional vector stochastic process which converges in probability 

to the k-dimensional random variable (x^^\ ... ,«***)• Letglxf^'* . 

x<*>) be a continuous function of (x<*>,..., x**>) in R^. Then 

<4.3.17) p lim g(x<^\ ..., x«>) = g(x»>.x‘*>). 

n-^oo 

The proof of this theorem is similar to that of 4.3.5 and is left to the 
reader. 
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Again it- is to be noted that we can relax the conditions of 4.3.6 by 
requiring that ..., be continuous in a closed A:-dimensional 

interval /» for wMch .x‘**) 6 /*) = 1 . 

The following corollary of 4.3.6 will be particularly useful in later 
chapters: 

4.3.6a If {x, c\ x^, Xj, y^;...) is a vector stochastic process such that 
(*i» Vif %;•••) converges in probability to (x, c) where x has 
c.d.f. /^(x) and where c is a positive constant, then at every point of 
continuity of F(x), 

(4.3.18) lim P(x„ + < z) = P(z - c) 

n-»oo 

(4.3.19) lim P(x„y„ < z) = f{-] 

n-»oo \C/ 

(4.3.20) lim pfe < zj = P(cz) 

fl-« \y„ 1 

(4.3.21) lim P(ax„ + < z) = f( ^ ~ 

n-»oo \ a / 

where a and b are constants and a> 0. 


The following theorem will be of interest later: 


43J 


Suppose (x<i^>,..., x<i*>; .... x!f >,. 

is a vector stochastic process such that 


• > •*'2 • 9 ! f 2 > 


.) 


(4.3.22) i = n = l,2,... 

and ..., ..., . ..) converges in probability to 

the vector constant ..., Let ..., be a single¬ 
valued function defined at all points in Rj^ and continuous in some 
open k-dimensional rectangular interval containing ..., 

Then 


(4.3.23) p lim g(y«\ ..., yf) = g(c«>,.... c<«). 

n-»oo 

The proof of this theorem involves no particular difficulties and is 
omitted. 


It should be pointed out that the conditions in 4.3.7 can be relaxed by 
assuming g(x(^>,.,., x***) to be defined only in some closed A:-dimensional 
interval 4 containing (c‘^’,.... c<*’) for which P(x<^>,..., x<*> 6 4 ) =* 1 , 
and such that g(x*^>;..., x‘*’) is continuous in 4 - 
Suppose (g(x, 6), g(xi, 0), g(xg, 6), ...) is a stochastic proems for every 
6 in some interval (O', 0*). If for an arbitrary e > 0 there is an n, such 
that for each 0 in (O', O') 

(4.3.24) P((g(x,. 0) - g(x, 0)1 > e)< e 
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for n > rtg, then (gCxi, 0), g(x^^ 0),...) is said to converge in probability to 
g{x^ 0) uniformly with respect to 0 in (0', O'"). This notion can be extended 
in an obvious manner to situations where g{x^ 0) is degenerate and also 
to those where is a vector random variable. 

Finally, we shall find the following theorem useful in later chapters, 
particularly Chapters 12 and 13: 

4.3.8 Let (a?!, x^^ be a stochastic process depending on a parameter 0 
such that for each value ofOin some interval {d\ d^Xfni^i, ..., 0), 

« = 1, 2,..., and 0*(xi, ..., xj, n = 1, 2,..., are sequences of 
random variables converging in probability respectively to g(0) and 
0 uniformly with respect to 0 in (0', 0''), where g(d) is continuous in 
(0', 0"). Then ..., 6*), n = 1,2,..., converges in proba- 

bility to g{d). 

Consider the stochastic process x ^,...) at any point 0o in (0', 0""). 
Then the sequence ajg),...) converges in probability to Oq. 

Thus for an arbitrary ^ > 0 such that (0o — e, 0o + «) is in (0', 0") there 
is an n^ such that we have (writing 0* for 0*{xi ,..., xj) for n> n^. 

(4.3.25) F(0o - e <6* <0 q + €)> I -e. 

Now let gu(®o) t>e the least upper bound and giid^) the greatest lower 
bound of ^(0) for values of 0 in (0o — 0o + e). Then since {fi(xj^, 0), 
f^(x^, 0 ^ 2 , 0),.. .) converges in probability to ^(0) uniformly with respect 
to 0 in (0', 0"), there exists, for an arbitrary e' > 0, an n,^ such that for 
n > n^>, and for any 0 in (0o — e, 0o + 

(4.3.26) P(gi{6o) - «' </«(*!> g^ido) + e') > 1 - e'. 

Now for any n > max («„ «,.) let E„ be the set in R^, the sample spSce 
of (xj, * 2 ,...) for which both sets of inequalities indicated in (4.3.25) and 

(4.3.26) hold. It is evident that P(£„) > I - s - e. But any point in E„ 
will satisfy the following inequality: 

(4.3.27) g,(6o) - e' .< ?u(®o) + e'. 

Hence for n > max («„ n,.) the probability that (4.3.27) is satisfied for 
any 0* in (0o - e, 0o + «) exceeds 1 - « - e'. But since e and e' 
are arbitrary and since g(0) is continuous in (0,0 ), the differences 
gndSo) - gi^o) and ^(®o) “ can be made arbitrarily small. Hence 

/„(xi. x„, 0*) converges in probability to g(0o). But 0o is any point in 

(0', 0"), thus concluding the argument for 4.3.8. 
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4.4 ALMOST CERTAIN CONVERGENCE 

(a) DefinitioD 

Let (xj, Xf,.,.) be a stochastic process with associated probability 
space (R„, P), and let E be the set of all sequences (bi, ...) which 

converge. To see which points belong to E let be the set in R„ for 
which 

(4.4.1) ; = 1.2,...,p 

N 

where iV is a positive integer. 

Enp^y for each n, /?, and N, is a Borel cylinder set in which, of course, 

00 00 

belongs to, The set Ef^y = U fl ^npN belongs to and is a 

n=l j)**l 

decreasing sequence for JV = 1, 2,_The set of points E mentioned 

above is the limit of this decreasing sequence, that is, 

(4.4.2) £ = lim£(Ar) 

which also belongs to and hence has a probability P(£). This 
probability may be referred to as the probability of convergence of the 
stochastic process (x^, Xj,...). £ is called the convergence set in £„. 

Now suppose £(£) = 1 and let x be a random variable which has the 
value lim b„ if (pi, 62 . • • •) belongs to £ and has the value 0 if (bj, b ^,...) 

n-^oo 

does not belong to E, We then say that the sequence of random variables 
almost certainly converges to the random variable x. 

Remark. The reader will find it helpful to interpret the preceding remarks in 
terms of events in the original basic probability space {R, P). Denoting, as 

usual, a sample point in R by e, the stochastic process (all components l^ing 
measurable with respect of may be written as ...). Thus, for 

each sample point ^ in P we have a countably infinite sequence of random 
vaft'iables which maps e into a point in Poo* If there is a random variable x{e) 
(measurable with respect to df) such that lim x^{e) = x{e) for every point e in E' 

n-*rOO 

the preimage (in R) of the set E (in £ 00 ). we say that (xi{e), x^e ),...) converge 
to x(e) with probability P(E'). E' is the convergence set in R. If P(E') =■ P(E) = 
1, we say that (x^(e), x^e), ...) almost certainly converges or converges with 
probability 1 to x{e). 

(b) Rehtion between Almost Certain Convergence and Convtfgence in 
Probability 

It will be noted that almost certain convergence is a stronger type of 
convergence than convergence in probability. In fact 
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4.4.1 If (x, Xi, X 2 ,...) is a stochastic process such that (x^, x ^,...) almost 
certainly converges to x, the sequence converges in probability to x. 

For if E is in the convergence set of (a;j, X 2 ,...), then for any e > 0 and 
n, E is in the set in defined by the limit of the following expanding 

sequence of sets in /?„; [\x„^^ - x\ < s, J = 0, I, 2 ...],n = 1,2, _ 

But if in R„, E„ and F„ denote the events [|a:„+j — x\ < s,j = 0, 1, 2,...] 
and [|a5„ — *(<€] respectively, then £„ <= F„. Hence 

(4.4.3) P(K - 4 <e)> - a;| < £.;• = 0, 1, 2,...). 

Taking limits as n 00, we obtain 

(4.4.4) lim P(|a:„ - a;| < e) > lim P(\x„ + f - x\ < e,j = 0, 1, 2,. ..) 

n-*QO n-»oo 

= P(lim £„) = P(£) = 1. 

that is, 

(4.4.5) lim P{\x„ — a;| < c) = 1 

n-* 00 

which is equivalent to (4.3.1), thus concluding the argument for 4.4.1. 

4.5 KOLMOGOROV’S INEQUALITY 

The following interesting extension of notions in Chebyshev’s inequality 
to a set of inequalities is due to Kolmogorov (1928); 

4.5.1 Suppose x-^,... ,x„ is a set of independent random variables having 
0 means and variances af,..., a^. Let c^ = of-{- • • • + a^. The 
probability that all of the inequalities 

(4.5.1) |a;i +••• + *«! < Ac„, a = 1,..., n 
hold is at least 1 — 1/A®. 

To prove this result let P„ be the event in the sample space P„ for which 

1*1 H-h a:.| > Ac„, a = 1. n. 

Let = Pi, £2 = Pi n Pj, E^ - r\ r\ F^,..., E„ = r\ r, 
• • • n P„_i n P„, and let be the complement of £1 U £2 U • • • U £„. 

(7„ is the set in £„ for which all of the inequalities in (4.5.1) hold. 

El, ...,E„, are disjoint and their union is the entire sample space R„. 
Thus, 

(4.5.2) P(G J = 1 - [P(£i) + • • • + P(£„)]. 

Let Zghe & random variable having the value 1 at each point in £„ and 
0 at each point in P,, a = 1 ,...»n, and z„+i a random variable similarly 
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defined for G„. We then have 


(4.5.3) ^(*1 H-h x„y = ^[zi(xi +-h »„)*] H- 

• • • + + • • • + »«)®] + ^l^n+lOh + • • • + 


Any function of (xj, is independent of the remaining x's. 

Hence 


?a • */j) = <^(ga) • AV = 0, /S = a + 1,..., n. 
We have therefore 

(4.5.4) ^[z,(x, + • • • + z:„y] = + • • • + 

+ '^[2«(a:«+i + • • • + a:„)*] > ^[2«(% + ' ■ 


lit 


■ph 

+ x«)^]. 


But (a?! + • • • + a:«)® > A*c^ at each point in E^. Therefore we have for 
a* 

(4.5.5) ^[zjix, + ■■■ + x„r] > X^clPiE,). 

Using the fact that ^(x^ + * * • + = c^, applying (4.5.5) to the 

right-hand side of (4.5.3) for a = 1,..., w, and dropping ^lz„+i(xi -1- 
• • • + *n)*]. we obtain 

(4.5.6) cl > AV„[P(Ei) + • • • + P(£„)]. 

It follows from (4.5.6) and (4.5.2) that 

(4.5.7) P(GJ > 1 - 
which completes the proof of 4.5.1. 


4.6 THE STRONG LAW OF LARGE NUMBERS 


In Section 4.3 we referred to the weak law of large numbers which states 
that under certain conditions the mean of a sequence of independent 
random variables with 0 means converges in probability to 0. The question 
arises as to whether we can make a probability statement that all means 
in the sequence of means from some point sufficiently far out in the 
sequence are arbitrarily close to zero. This is answered by the strong law 
of large numbers, which may be stated as follows: 


4.6.1 


Let («!, a?2 ,be a sequence of independent random variables 
having 0 means and variances (af, (y|,...) such that 2 < + oo. 


Let =5 (xj + • • • + «„)/?!. Then (xj, x,,...) converges almost 
certainly to 0. That is, for arbitrary d > 0 and e > 0, there is an 
such that 


(4.6.1) P(|x^| <e,n:=:^N,N+l,...,N + k)>l-d 


for all N > iVa,, and for every k. 
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To prove 4 . 6.1 we follow a line of argument similar to that used by 
Feller (1957). Let ..., be the events |jr„| < e, « =* 

NtN + 1. N + A:, respectively. Then the probability in (4.6.1) is 

(4.6.2) ^(^AT ^jv+i n • •' O -Ejv+fc). 


But this probability has the value 
(4.d|^||^' 1 — P(Eff U u • • • u 

Tlflpbr N > iV, j and all k we must show that 
(4.6.4) P{Ef^ U En+i u • • • u Ef,^^ < d. 


Let us partition the positive integers into sets ... where 4 is the 
set {2“~^ + 1, 2*“^ + 2,..., 2“}. Let be the event for which at least 
one of the inequalities {\x„\<e,neQ fails. Then for some a and /? we 
have 


(4.6.5) F. U U • • • U ^ 
Also, we have 


(4.6.6) P(F, U F„+i U • • • U F,+^) < F(F„) + • • • + F(F,+p). 

00 

Thus, it is enough for 2 ^(^a) converge. Now is the event that for 

a=»l 

at least one n in /, we have |x„| > e, which may be wrUten 


|xi + • • • + a:„| > ne 

which implies that 

ki + • • • + a:„l > {— —)c2«, 

where c|a is defined in 4.5.1 as of + • • • + Oga. 

But it follows from Kolmogorov’s inequality that 


Hence 

(4.6.7) 


P{F.)< 


4c|« 
6*2*“ ■ 


o'* 

00 A ‘ 


a = l 


i = l 


Reversing the order of summation with respect to i and « and observing 
that 2 2“^ over all positive integers a for which 2“ > i does not exceed 
2/“*, we obtain 


“> 8 * rt* 

Otel C I 
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00 00 

Hence^PCFJ converges does, in which case the right-hand 

«-l »—1 

side of (4.6.d) can be made <6 for all values of p by choosing a sufficiently 
large. It then follows from (4.6.S) and (4.6.6) that for sufficiently large N 
(4.6.4) and (4i6.1) hold, thus completing the argument for 4.6.1. 

It should be noted that if all components in (x^, have equal 

(finite) variances the condition ^ 4 < + oo is automatically satined. 

i»»l ^ 

As a matter of fact it will be seen that ...) converges almost 

certainly to 0 if a; 2 ,... are independent and identically distributed with 
0 means. See Khintchine (1929). 


PROBLEMS 


4.1 If (oj, y; Xi, y^\ yg; • • •) is a stochastic process such that x ^,...) 
and (^ 1 , i/ 2 » • • *) converge in probability to the random variables x and y, 
respectively, where x and y are equivalent, show that for an arbitrary e > 0, 

-y„\ < e) = 1. 

n-voo 


4.2 (Continuation) If lim S(x^ — y^^ — 0 and if (x^, ...) converges in 

n->oo 

probability to the random variable x show that ^ 2 * * * •) ^Iso converges in 
probability to x. 


43 If (xi, X 2 ,...) are independent random variables having identical c.d.f.’s 
F(x) where 

t o a; < 0 

X 0 < a? < 1, 

1 a? > 1 


let (ui, 1 / 2 ,...) be the stochastic process where 

Un « max (a?i,..., a:J. 

Lett^n /K1 ~ ^n)* Show that the sequence (i;i, ...) converges in distribution 

to where F(v) « 1 — for y > 0, and F(v) » 0, for i; < 0. 


4.4 (Continuation) If ^(a;) is any differentiable single-valued continuous and 
inmasing function of x over (0, oo) and if ^g(n{\ — i/n)) show that 

• • •) converges in distribution to H(w\ where is given by 


e-r-Hw) 




for w on the interval ig(f)\g( +oo)) and 0 elsewhere, where g'^Kw) is the inverse 
otg{x). 

4J5 A discrete random variable x^ which arises in the theory of extreme 
observations, is known to have c.d.f. 


Fn(x) 


_ n(n-r+ 1) _ 

(n + nxXn + na? — 1) • • • (/i + /w? — r + 1) 
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in 


where r <n + \, the mass points of being l/n, 2/«. Show that 

the sequence of random variables (kj, —) converges in distribution to F(x), 

where 


F(x) = 


0 , 

1 - 


1 


(1 + xy ’ 


a < 0 

X > 0. 


4.6 Referring to 4.3.6a, show that if d is any constant ^ 0 

lim Pix^ + /i 2 /„ < d) = 0. 

W -»-00 

4.7 Suppose (a?!, ajg,.. .) is a stochastic process such that * 0, 

« = 1, 2,..., and cov (a:,, ar,) = = 1, 2,... where a® is 

finite and /> lies on (0, 1). Let 

Wfc = ^ (^1 + ‘ + ^2fc-l) 

and 

1 , 

^ (^2 + ^4 + • • * + ^ 2 *) 

^ = 1 , 2 ..... 

Show that the vector stochastic process {u^y Vi; « 2 » *^ 2 ^ • • •) converges in distri¬ 
bution to (w, t?) where u and i? are equivalent random variables having 0 means 
and variances a\ 

4.8 Suppose {x^y •. •) ^n) n independent random variables all having 
means 0 and variances 1. Consider the set of inequalities 

Va 


where ^ (^i + * * ’ + ^a)* 

Let be the event that all inequalities are satisfied. Then (7„ is the event that at 
least one inequality fails. If the ath inequality in the set is the first to fail given 
that Gn occurs, show by methods similar to those used in dealing with Kolmo¬ 
gorov’s inequality, that for any A > 0, 

„ +(A*-lK(a|C„). ’ 

where <^(a | G^ is the conditional mean value of a given that G^ occurs, and 
where a is the random variable denoting the first of the above inequalities to fail. 

4.9 {Continuation) Consider the inequalities 

+ • • 4- irj < aA®, a = 1,. .., /i. 

Show that if G^ is the event that at least one of the inequalities fails, then P{G^ 
satisfies the same inequality as in the preceding problem. 

4.10 If {«!,..., are independent and positive random variables whose 
means are all equal to show that for any A > 0, the probability that all of the 
inequalities {x^x^'' ’ x^ < A/i“, a =* 1,..., /i are satisfied in at least 1 — 1/A. 
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4.11 Extend the strong law of large numbers to the case where the sequence 
of independent random variables ...) have (finite) means jui, )U 2 » • • • ^ 

well as (finite) variances ol 4 . 


4.12 If (xi^ •.. > ^n) ^ independent random variables all with zero 

medians, show that 

^( 1*1 + • • • + *«|) > <p{n)d 

where 


J = + ••• +.fK|] 

n 


and 


(2k + 1)! 
22*(Ar!)^^ 


qilk + 1) = ^2A: + 2) 

a result due to Tukey (1946). 

4.13 Suppose (a?!,..., a;^) is a /:-dimensional random variable with mean 
(0,.... 0) and covariance matrix ||or^^|| where =* a® and Show that 


P(nmlx,l > to) D'/l -p + Vl + (A: _ l)p]. 


a result due to Olkin and Pratt (1958): 

4.14 In 4.3.6a show that the four statements (4.3.18), (4.3.19), (4.3.20), and 
(4.3.21) are true if the two stochastic processes (a?!, a? 2 ,...) and (y^y ^ 2 * • • •) 
converge respectively to the random variable x and the positive constant c. 

4.15 Prove that if (x-^, X 2 ,...) is a sequence of random variables with p.d.f.’s 

• • •) /a(^) —/(^) ^^or all x in except possibly for a set of 

a->oo 

probability 0 where f(x) is a p.d.f., then 

lim I fjx) dx = I f(x) dx 
a -*-00 Je Je 


uniformly for all sets E ^ [This result is due to Scheffe (1947). It should be 
noted that the conditions in Scheff(6’s theorem are stronger than those in 4.3.4. 
A discussion and comparison of various conditions for convergence of distri¬ 
butions have been given by Robbins (1948)]. 

4.16 If (Xy a? 2 ,...) is a stochastic process such that (xj, x^y ...) converges 
in probability to the random variable x and if gi(x)y ... is a sequence of 
continuous functions which converge uniformly to g(x) over a bounded interval 
show that (^i(^i)«^ 2 (^ 2 )> • • •) is a stochastic process which converges in proba¬ 
bility to ^(x). 

4.17 Let Xj,..., be independent random variables and let « x^, 

y* “ • • •»2/n ~ ^1 + * • * + x„, if F^iy^ denotes the c.d.f. of y^y 

f « 1, ... n, and F(yi,..., t/J the c.d.f. of (t/i,..., Vn) show that 

PiVl. Fl(yi) * • • /^n(2/n). 

a result due to Robbins (1954). 

4.18 Prove 4.3.7« 



CHAPTER 5 


Characteristic Functions and 
Generating Functions 


5.1 CASE OF A ONE-DIMENSIONAL RANDOM VARIABLE 

One of the most important classes of problems in mathematical statistics 
is the determination of distribution functions of measurable functions of 
random variables, that is, functions of random variables which are random 
variables themselves. A few methods were presented in Section 2.8 for 
dealing with these problems. These methods, however, are often technically 
diflScult or tedious to carry out in specific cases. Some situations, partic¬ 
ularly those involving linear functions of independent random variables, 
can often be handled in an elegant manner by making use of the character¬ 
istic function of the particular function of the random variables under 
consideration. The characteristic function and related devices are also 
useful in some cases for such tasks as generating moments and cumulants of 
distributions and testing independence of two or more functions of random 
variables. This chapter will be devoted to these methods and their applica¬ 
tion. 

The characteristic function (p{t) of a random variable x having c.d.f. 
F(x) is defined as 

(5.1.1) (p{t) = = I dF{x) 

where / = V—l and t is real.* It is sometimes convenient to say that 
^(0 is the characteristic function corresponding to F{x) or more briefly, 
the characteristic function of x. 

Since = cos tx + / sin tx 

* Throughout this book we denote V— 1 by i to avoid confusion in various sections 
with the use of italic / for indexes of summation. 
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COS tx and sin tx are both integrable over Ri for any real r, then q>(t) is a 
complex number whose real and imaginary parts are finite for every value 
of t. 

The moment-generating function y>{t) is defined as 

(5.1.2) = = 

We are usually interested in y)(t) for values of t in some neighborhood of 
^ =s= 0. But v»(/) does not exist for every c.d.f. and all values of t as does 

<p(t). 

The factorial moment-generating function d(t), if it exists, is obtained by 
replacing t by log t in (5.1.2), thus yielding 

(5.1.3) 6(t)^xp{\ogi)^S{t% 

If the random variable x is discrete so that the mass points of x are 
positive integers, then 6(0, if it exists, is also called the probability generating 
function of the random variable x. For if 6(0 is expanded into a series in 
f, the coefficient of is p(x\ the p.f. of x. 

If the rth moment //'(a:) exists, we can differentiate (5.1.1) h times, 
0 < A < r, with respect to / and obtain 

(5.1.4) 9 ^‘*>(f) = f* r x\'*^ dF{,x), 

J — 00 

from which we find the hih moment pffx) to be 

(5.1.5) 0<h<r. 

r 

For convenience we define (p^^^(t) to be (p(t). 

Similarly, if y;(t) exists for values of / in a neighborhood of zero, we have 

(5.1.6) ju',(x) = v;«(0). 

If all moments including the 2ith moment of x exists, we may write 

(5.1.7) 9!<0 = f [l + 2 + 7^^ (cos t'x + i sin I"*)] dF(x) 

J-x L h=i hi (2s)! J 

where t' and /" are numbers in the interval (0, /). Assuming x to be non¬ 
degenerate, then # 0 and we may write 


(5.1.8) 


/>=! hi (2s)! 


where gi(t, s) is a complex function whose real and imaginary components 
are respectively, the mean values ^(x^‘ cos t'xjfi^) and <S’(x^ sin t''xjfi^. 
We note that |gi(t, j)| < 1. Furthermore, since t' and t" both lie in the 

interval (0, t) we have lim gft^ j) * 1. 

<-*o 
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In a similar manner, if all moments including the (2s — l)th moment 
exist we may write 

(5.1.9) »(t) - 1 + .) 

h=i hi (2s — 1)! 

where g 2 (t, s) is a complex function whose real and complex components 

are cos and sin respectively, where 

and /g are numbers in the interval (0, /), and is the (2s — l)th absolute 

moment as defined by (3.3.13), which is ^ 0, if cc is nondegenerate. Again, 

/ 

we have \g 2 (t, ^)| < 1. Furthermore, lim^gC^ , which, of course, 

is finite. 

No matter whether the highest moment is odd or even, we may summarize 
as follows: 

5.1.1 If the rth moment of a random variable x exists, then q)(i) can be 
expanded in a neighborhood of t 0 as follows: 

(5.1.10) 9^(0 = 1+i + 

h=i h\ 

where lim = 0. 

f 

If (p(t) can be expanded as stated in (5.1.10), we may also expand 
log (p(t) as follows: 

(5.1.11) log 9:<0 = i f; (it)^ + o(0. 

/i=i hi 

The quantities /c^ are called semi-invariants or cumulants of the c.d.f. 
F(x), originally defined and studied by Thiele (1903). 

Note that any semi-invariant is a polynomial in the moments 
fjL[, /4, • . •, and vice versa. The first few semi-invariants are as follows: 

( 5 . 1 . 12 ) ^1 = /^; = /^ 

*^2 = - ((«;)' = 

^3 ~ 


and conversfly, we have 

(5.1.13) = /Cl 

/^2 ^ ^2 + 

/^3 = ^3 + ^'<^ 1^2 + 
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There are many problems in probability theory and its applications, 
especially to sampling theory, in which it is quite easy to find the charac¬ 
teristic function of a function of the components of a multidimensional 
random variable, particularly a linear function. The question which 
arises, of course, is how to find the c.d.f. of the random variable from its 
characteristic function. An answer to this question can be found in many 
cases in the following theorem due to L6vy (1925): 

5.1.2 Let X be a random variable having characteristic function q?(t) and 
c.d.f F(x). Then if F(x) is continuous at x x' ± d, 6 >0, we have 

(5.1.14) F(x' + d) — F(x' — (5) = lim ~ f ~ <p(t) dt. 

A-^co TT —A t 

^00 

Furthermore, if l9)(0| dt < + 00 , a p.d.f. f(x) exists at x = x' 
and J-” 

(5.1.15) /(*') = :r 


To prove this theorem we replace q){i) in (5.1.14) by its integral expression 
in (5.1.1) and write 

(5.1.16) G(a:', A, 6) = - f" dF{x) dt. 

TT J-A t J-oo 


Since 

(5.1.16; 


sin dt 


g-me-x) < ^ we can invert the order of integration in 


I, obtaining 


(5.1.17) G(x', A,d) = \ m(x, x'. A, 6) dF(x) 

J— 00 

where 

(5.1.18) m(,x, x', A,d):=i f"* c'W*-* ’ dt 

TT J-A t 

2 sin dt r./ 

s- « -cos [r(a: — »')] dt 

rr Jo t 


_ 1 sin (a? — x' -h S)t 

TT Jo t 

_ 1 sin(x — x' — S)t , 

w Jo t 


Taking limits, we have 

(5.1.19) lira G(»', A, S) = j ° 
V, A-^oa J— 


lim mix, x'. A, 8) dFix). 

00 A-^oo 
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But making use of the fact that lim f , it will be seen that 

r—to Jo u 2 


(5.1.20) 


0 , 


lim m{x, x'. A, d) — 

^-♦00 


i 

1 . 


X < x' — S, X > x' + d 
X = x' — d, X ^ x' + d 
x' — d < x < x' + d. 


Therefore, since F(x) is continuous at a: = a;' ± it follows, upon using 

these values of lim m(x, x\ /f, in (5.1.19), that 

^->00 

lim G{x\ A, d) = r "^^dFix) = F(x' + 6)- F{x' - d) 

^-*00 Jx'-S 


which establishes (5.1.14). 

Now if we divide (5.1.14) by 23, we have 


( 5 . 1 . 21 ) 


F(x' 4- d) - F(x' - 3) 
23 


= -1 
277 J-00 


sin dt 


If f'(a;) has a derivative f(x') at a; = x', since 


(* 00 


6t 
sin 6t 

~ir 


99(1) dt . 


-itx‘ 


" 9’(0 


< l9>(0l.an(l 


199(01 dt < + 00, we have 

+ - f f" 1™ (^) .-“VO 

23 J 27TJ-ood-*Q \ 3t / 


V 

hm - 

a —0 - 


Since lim (sin 3tl3t) = 1, we therefore obtain 
a-^o 

fW) = ^ r e-'‘®XOd< 

which is formula (5.1.15). 

If two random variables x^ and x^ have identical c.d.f.’s, then it is evident 
that their characteristic functions are identical. Conversely, suppose the 
random variables have identical characteristic functions (p{t). Then if 
Fi(x) and are the c.d.f’s of the two random variables it follows from 
(5.1.14) that if a;' ± 5 is any interval such that Fi(x) and are continuous 
at the end points x' ± 3, then 


Fi(x' + (5) - F^(x' - (5) = F^{x' + (5) - F^(x' - 3) 


which, together with the fact that Fi(x) and ^ 2 ( 0 :) are c.d.f.’s, and with our 
convention of making all c.d.f.’s continuous on the right, implies that 
F^(x) ~ F^ix). 
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Thus, we have the following result: 

5.1.3 Ifx^ andare random variables having c.d.f.’s F^ix) and F^x) respec¬ 
tively, and characteristicfunctions cpfi) and q> 2 (t) respectively, a neces¬ 
sary and sufficient condition for Ffx) = F 2 (x) is that q)i{t) = (pff). 

This one-to-one correspondence between c.d.f.’s and characteristic 
functions is highly useful in probability theory since it provides a basis by 
which c.d.f.’s may be identified from their corresponding characteristic 
functions. 


Example. Suppose x^,... ,x^ are statistically independent random variables 
having p.d.f. 

/(*!. Xi>0, i = l,...,k 

[o, otherwise, 

k 

and let L =2 moments and p.d.f. of L. 

The characteristic function of L is 


(5.1.22) ^(/) = (*”••■ ^ (1 _ 

Jo Jo 


The rth moment n'fL) is given by applying (5.1.5) for 


m;(l) = 

Thep.d.f. of Lis, by (5.1.15), 
(5.1.23) /(L) = 1 


9 ><’->( 0 ) _ r(/t + r) 

l\k) 



it) (it. 


The integral can be evaluated by contour integration in the complex plane by 
making the transformation 

z = —L(l — it) 


which gives 


Ik-ie-L. // 


where H is the Hankel integral given by 

(5.1.24) ^ = e-%-z)-^dz 

J — Z, -f / 00 

which has the value llT(k), [See, for example, Whittaker and Watson (1927)]. 
The p.d.f. of L is therefore given by 

(5.1.25) /(L) - L‘-ic-i L > 0 

* 0 , 


L <0. 
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5.2 CASE OF A A:-DIMENSIONAL RANDOM VARIABLE 

Suppose (ajj,..., is a fc-dimensional random variable having c.d.f. 
F(x ^^..., The characteristic function (p(t^, ..., /;fc) of (a?!,..., a:;j) is 
defined as 


(5.2.1) qiti, ...,Q = (^l^exp 2 

== exp (/ 2 dFix ^,. 

The moment-generating function y(/i,.... is defined by 


(5.2.2) 


t„) = . • • • 


Similarly, the factorial moment-generating function 0(li,..., is 
defined by replacing t^ by log i = in \p{ti, ..., tj^. 

If the joint moment /^^^... exists, it can be obtained by differentiating 
<p{ti ,. .., 4) as follows: 


(5.2.3) 


/^ri . . . t 




i(ri+ • • • +rjfc) 


If exists, then all joint moments 0 < /?,. < r,, / = 1,. .., fc 

exist. 

It should be observed that the characteristic function of any subset of 
the components of the random variable (xj,..., Xj^) is obtained by setting 
equal to zero the /’s corresponding to the random variables not included 
in the subset. For instance, the characteristic function of the random 
variable (x^,..., X;^^), < k, is (p(t^, ..., t^^y 0, ..., 0). 

The extension of Levy’s theorem to the case of k random variables is 
straightforward and may be stated as follows: 

5.2.1 Let (xj, ,Xj^ be a k-dimensional random variable with charac¬ 
teristic function 9 (^ 1 ,..., and c.d.f. F{x^y ..., x^^). Let 4 
the interval x\ — d, < i = 1 ,... , fc in Rj^y > 0 , 

and let F{x^yyXj^ be continuous on the boundary of the closed 
interval x[ — < x, < = 1,..., k. Then 


(5.2.4) P{{x ^,.... X,) G /») = lim i r • • • f'* nr5i2-^ e-«-x' 

A-*<x>7T J-A J-A ML ti J 


• (P{tly . . . , 4) ^4 ’ * ‘ 
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Furthermore, if I , Q\ - • • dt^ < + oo, a p.d.f. 

jRu 

f{xi,... ,x^ exists at (x[,... ,x^) and 
(5.2.5) f{x[, ...,«»)= • • J_^ exp 2 

• 9’('i. ■■^h)dh--- dt„. 

The proof of 5.2.1 is similar to that of 5.1.2 and is omitted. 

Theorem 5.1.3 on the one-to-one correspondence between c.d.f’s and 
characteristic functions can be extended to the case of a /:-dimensional 
random variable without new difficulties. 

5.3 CHARACTERISTIC FUNCTIONS OF INDEPENDENT 
RANDOM VARIABLES 

Characteristic functions are sometimes useful in determining whether 
two random variables are independent without having to determine first 
the distribution function of the two random variables. The essential 
result here may be stated as follows: 

5.3.1 If (xi, X 2 ) is a two-dimensional random variable, a necessary and 
sufficient condition for and x^ to be independent is that 

(5.3.1) <^(^1, t^ = (p{t^, 0) • 9P(0, /g) 

where (p{ti^, t^ is the characteristic function of Xg). 

Note that (pit^, 0) is the characteristic function of x^ and 9?(0, t^ is the 
characteristic function of Xg. It is convenient to denote (p{t^, 0) and 
9 >( 0 , ^ 2 ) by and (p^it^ respectively. 

To see that the condition is necessary, we assume x^ and Xg to be inde¬ 
pendent, that is, F(x^, Xg) = F^ix^ F^ix^. Then we have 

(5.3.2) 9<ti.f2)= f 

jRt 

= r e«i*i dFiix^ I* e“*** dF^ix^ 

J — CO J 00 

that is, 
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Now consider the sufficiency. Assuming (5.3.1) to hold, we have, by 
applying formula (5.2.4) for k = 2, 


(5.3.3) 


Pixi x; + dt; i = 1,2) 


That is, 
(5.3.4) 


" ^ L L M 

= nr lim - r (PiiQdtX 

i^lLA-^co TT J-A ti J 

?((*!, *2) e /g) = P(a:i e /}i>) • Pix^ e 


where is the interval + 6 ^, / = 1, 2, and /g is the 

Cartesian product x But we know from 2.4.2 that (5.3.4) implies 
that 

F(x^, x^) = Fjix^) • F^ix^) 


that is, independence of x^^ and x^. 

The extension of 5.3.1 to any (finite) number of random variables is 
straightforward and is left to the reader. 

Another useful property of characteristic functions of independent 
random variables is their particular form in the case of linear functions of 
random variables. The essential result on this matter, and which can be 
readily verified by the reader may be stated as follows: 

5.3.2 Suppose L is a linear function of k independent random variables 
• • • j defined as follows: 

(5.3.5) ^=2 

» = 1 

Let (p{t) be the characteristic function of L and (piit^ that of x^^ 
i = 1,..., k. Then we have 

(5.3.6) <p(t) = n 

* = 1 

Suppose and Xg are independent random variables have c.d.f.’s 
F(xi; flj) and F(x 2 ; 62 ), respectively, where Oj and flg ^^e values of a 
parameter 0. Let L denote the random variable x^ + Xg. Then if the 
c.d.f. of L is F(L; 61 + flg), the c.d.f. F{x; 6 ) is said to be reproductive with 
respect to 0. We may similarly speak of a p.f., a p.d.f. or a distribution 
depending on d as being reproductive with respect to 0. Reproductivity 
is an important property which will be used frequently in later chapters. 
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The notion of reproductivity can be extended without difficulty to the case 
where either or both the random variable x and the parameter d is 
multi-dimensional. Characteristic functions provide a useful criterion for 
determining whether a c.d.f. F{x; 6 ) is reproductive which can be stated as 
follows: 

5.3.3 Suppose x^ and x^ are independent random variables having c,d. f"s 
®i) tind F{x^\ O 2 ) and characteristic functions <p(t; 61 ) and 
® 2 )» respectively, where and B^ are parameters. Then a 
necessary and sufficient condition for F{x \ 6 ) to be reproductive with 
respect to B is that 

(5.3.7) (p(t\ B^(p{t\ 62 ) = (p{t\ 01 + ^ 2 )- 

The proof of this is straightforward and is left to the reader. 

5.4 CHARACTEWSTIC FUNCTIONS OF A SEQUENCE 
OF RANDOM VARIABLES 

As we have stated in Section 4.1, one of the important problems in 
probability theory and its application to sampling theory and related 
topics is that of the convergence in probability-of a sequence of random 
variables. Since c.d.f.’s are uniquely determined by characteristic func¬ 
tions, the problem of convergence in probability of a sequence of random 
variables can often be more easily handled by dealing with the convergence 
of the corresponding sequence of characteristic functions than by dealing 
directly with c.d.f.’s of the random variables. The fundamental principle 
involved here is expressed by the following theorem due to L6vy (1937) 
and Cram6r (1937). 

5.4.1 Let (a?!, Xg,...) be a sequence of random variables. Let (pfit), 
•• •be the corresponding sequence of characteristic functions. A 
necessary and sufficient condition for (x^, Xg,...) to converge in 
distribution to a c.d.f. F(x) is that for every value of t, the sequence 
q>ff)y ,.. converge to a limit q>{t), which is continuous u/ / = 0 . 
Under these conditions (p(t) is identical with the characteristic func¬ 
tion corresponding to F(x). 

First, let us show that the condition is necessary. If Fi(x), F 2 (x),... 
are the c.d.f.’s of (x^, Xg,...) and F(x) is the limit c.d.f., then we assume 
that the sequence F-fx), FJ^x), ... converges to the c.d.f. F(x) for every x, 
and we must show that lim ^,^(0 * (p{t) for every t. Now, we may write 

«^oo 

Too Too 

(5.4.1) f „(0 =s I cos tx dFJ_x) + /1 sin tx dF^(x). 

v-oo J—00 
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Since cos tx is bounded on (— oo, + oo) for every t and since Fi{x), ... 

converges to the c.d.f. F{x) for every x, we have by the Helly-Bray theorem, 
(see Loeve (1955), for instance]. 


(5.4.2) 

and similarly, 

(5.4.3) 


r 00 Too 

lim cos tx dFJ^x) = cos tx df(x), 

ri-*ao«/-oo J —00 

r 00 1*00 

lim I sin tx dF„(x) = I sin tx dF{x). 

n-* cc V — 00 w —00 


But (5.4.2) and (5.4.3) taken together are equivalent to the statement 
lim * e"'* dF„{x) = j ” dF(x). 

n-* 00 w — 00 J ~ CO 

that is, lim = 7^(0- 

n-*oo 


Now consider the sufficiency of the condition. We assume that 
lim (pn(t) = (p(t), for every t, and that 7(0 is continuous at r = 0. Now it 

n-*-co 

can be shown [see Cramer (1946) for instance] that the sequence of c.d.f/s 
Fi(x), ... contains a subsequence F,^J(x), F^J^x), ... which converges 
to a nondecreasing function F{x) continuous on the right. Our prob¬ 
lem now is to show that the function F{x) to which the subsequence 
Fn^{x), F^J^x), ... converges, and which is nondecreasing and continuous 
on the right, satisfies the remaining conditions for a c.d.f., namely, 
F(— 00 ) = 0 and F{+oo) = 1. It is clear that 0 < F{x) < 1. 

Now it can be shown by argument similar to that used in establishing 
(5.1.14) that for c> 0 

(5.4.4) cp I F„,(y) dy - - i F„^{y) dyl = i j* - - <p (t) dt. 

Lc Jo c J-c J ttJ-oo t 


In the case of the first integral on the left of (5.4.4) it will be noted that if y 
is regarded as a continuous random variable having p.d.f. 1/c in the 
interval (0, c), then Fnjiy) will be a random variable which is a function of 
the parameter n^. Since F^^ix)^ • • • converges to F(x), and since all 

these functions lie on the interval [0,1], we have 


lim - i f „/y) dy = -\ F(y) dy. 

<-♦00 C Jo C Jo 

Similarly, 

lim - f F„Xy) dy = - f F(y) dy. 

<-♦00 C J-c c J-o 



124 


MATHEMATICAL STATISTICS 


By similar considerations it can be shown that we may take the limit as 
/ -*■ 00 under the integral on the right (5.4.4). We obtain 

(5.4.5) cfi fV(y) dy - - i" F(y) dy\ = i f" ^ ^0 dt. 

Lc Jo CJ-e J TTJ-oo t 

Setting t = ujc and dividing both sides of (5.4.5) by c, we obtain 

(5.4.6) i fV(y) dy - i r F{y) dy = ^ f” du. 

C Jo cJ-c TtJ-oo \c/ 


If we let c 00 , then since F(y) is nondecreasing, the limit of the left- 
hand side of (5.4.6) is /'(+ oo) — /'(— oo). Since (p(t) is continuous at 
r = 0, we have lim (piujc) = (p(0) for every w. But (px{t\ ... con- 

C-*CO 

verges to <p{t) for every t. Hence lim 9 ?„( 0 ) = 95 ( 0 ). But 9 !>„( 0 ) = 1 for 

n-^Qo 

every «. Therefore 99 ( 0 ) = 1, and we finally obtain from (5.4.6) by letting 
00 

(5.4.7) F(+ 00 ) - f(- 00 ) = - f * ^ “ dw = 1 . 

TtJ-oo 


Since F{x) is non-negative, nondecreasing and cannot exceed 1, we must 
have F(-|-oo) = 1 and F(— 00 ) = 0. Hence F{x\ the limit of Fn^{x), 
F^J^x ),..., is a c.d.f. and its characteristic function is (p{t), the limit of 
(Pn^t\ .... Now ifthere is any other subsequence of Fi(a;), F^ix )^.., 

which converges to a nondecreasing function, let this function be F*{x). 
Then it can be shown by argument similar to that used above that F*(a;) 
is a c.d.f. whose characteristic function is identical with (p{t). Then since 
F(x) and F*{x) are c.d.f.’s with identical characteristic functions, it follows 
from 5.1.3 that F{x) = F*{x). This means that every convergent sub¬ 
sequence of F^ix), F^ix ),... converges to the c.d.f. F{x\ which, of course, 
is equivalent to the statement that F^{x\ F^{x\ ... converges to the c.d.f. 
F{x) which concludes the argument for 5.4.1. 

c The following corollary to 5.4.1 is useful in connection with problems 
of establishing convergence in probability to a constant: 

5.4.1a A necessary and sufficient condition for a sequence of random 
variables x^, x^,... having characteristic functions (pi{t\ q>2(t\ ... 
to converge in probability to the constant c, is that lim 9p^(/) =s 

n-*>oo 

The k-dimensional analogues Of 5.4.1 and S.4.1a are straightforward and 
are left to the reader for formulation and proof. 
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5.5 DETERMINATION OF DISTRIBUTION FUNCTIONS 
FROM MOMENTS 

(a) Determination of a c.d.f. by a Moment-Sequence 

In Section 5.1 it was seen that moments of distribution functions can 
be found, if they exist, by differentiating characteristic functions and 
(5.1.10) shows the relationship between the characteristic functions and 
those moments which do exist. Sometimes, however, the moments of 
the c.d.f. of a random variable can be found more effectively by other 
methods than by differentiating characteristic functions. There are various 
situations which occur in later chapters where moment-sequences of 
random variables and functions of random variables are easily determined 
one way or another. But the basic question is this: Under what con¬ 
ditions does a moment-sequence • • • » ^ random variable x 

determine the c.d.f. of x uniquely? A useful sufficient criterion [see 
Cramer (1946)] may be stated as follows: 

5.5.1 Let F{x) be a c.d,f, with moments //', r = 0, 1, 2,. . ., cf// q/* which 

00 n' 

are finite. If the series 2 -f o’* is absolutely convergent for some 

r—0 ^ • 

c > 0 then F(x) is the only c,d,f having these moments. 

From the definition (5.1.1) of the characteristic function belonging to 
F(x) and following a line of argument similar to that by which (5.1.10) 
was established, we may write 

(5.5.1) <p{t + u) 

1 -I- y -f- a:^(cos u'x + / sin dF(x) 

r=i r! n\ J 

where «' and u" are (real) numbers in the interval (0, u). If the /ith moment 
exists, we may apply (5.1.4) and obtain 

(5.5.2) ^(r + u) = 2'~9>^^>(0 + ^^ 

r=o r! ni 

where ^ is a complex function such that |^| < 1, and where is the wth 
absolute moment of x. If n is even, then and it follows froiri 

the hypothesis of our theorem that the remainder term ^ 0 as 

n 00 if |t/| < c. Now consider the remainder term for n odd. We note 
that for any real X 

(5.5.3) ^ |a;|(n+i)/2]2 ^ > 0 
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from which we must have 


(5.5.4) 

We may therefore write 


(5.5.5) 

n! 


L\(n - 1)! / \(n + 1)! / \ n / J 


\(n + 1)! 


Now and hence the right-hand side of this 

inequality vanishes as « -> oo, which implies that as /i -► cx) (« odd) the 
remainder term of (5.5.2) vanishes. Therefore we have 

(5.5.6) 9<I + «) = 1 + I 

r^irl 


the series being convergent at least for |w| < c. Putting r = 0 and using 
(5.1.5), we have 

(5.5.7) 9<«) = 1+ i^7('■w)^ 

r=i r! 

which means that for |m| < c, (p(u) is uniquely determined by the moments 
fjLf. By the process of analytic continuation it can be shown that formula 

(5.5.7) holds for all values of w. For all derivatives of (p(u) exist, for 
example, at w = ±ic and can be determined from (5.5.7), and hence for 
|m| < c, q)(ic + u) and 9 ?(—ic + u) can be uniquely expressed by the 
series (5.5.6) by replacing / by +ic and — Jc respectively. This is equi¬ 
valent to stating that (5.5.7) is valid for \u\ < |c. Continuing this pro¬ 
cedure step-by-step it is clear that (5.5.7) holds for all values of u. Thus, 
under the hypotheses of our theorem the characteristic function is uniquely 
determined by the moment-sequence and the characteristic function, in 
turn, uniquely determines the c.d.f. F(x) as stated in 5.5.1. 

The following corollary of 5.5.1 is useful in determining distribution 
functions defined over finite intervals, given the moment-sequence of such 
distributions: 


5.5.1a If X is a bounded random variable^ then its c.d.f F(x) is uniquely 
determined by its moments /i', r =* 0,1, 2,.... 

If X is bounded, then there are finite numbers a, b, a < 6, such that 
F(a) s= 0, and F(b) =1. If ^ denotes the largest of \a\ and 16|, then we 


have 



(5.5.8) 



and 



(5.5.9) 

r! 

r*o r! r“0 r! r*o r! 
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which is finite for all values of c. Therefore the sufficient condition in 

5.5.1 is satisfied, thus establishing 5.5.1a. 

Extended versions of 5.5.1 and 5.5.1a for the unique determination of 
a c.d.f. of a fc-dimensional random variable from the moments 

.r* stated. Formulation and proof of the extensions are 

left to the reader. 

(b) Determination of the Limit of a Sequence of c.d.f.’s by Moments 

Suppose (x^, ^ 2 , . . .) is a sequence of random variables and the moment- 
sequence of each component is given. A question arising is whether 
there exist simple criteria based on these moments for determining whether 
(^ 1 , x ^,...) converges in distribution to some random variable x. A simple 
theorem due to Kendall and Rao (1950) concerning the convergence in 
distribution of subsequences from (x^, Xg,...) to a c.d.f. using only the 
second moments may be stated as follows: 

5.5.2 Let fJL^ix^ be the second moment of x^^ in the sequence of random 
variables (x^, X 2 ,...). If 

(5.5.10) /^ 2 (^n) < K < + 00 

for n = 1,2,..., then there is a subsequence of (x^, Xg,...) which 
converges in distribution. 

To prove this let fn{^) be the c.d.f. of x^^. We have for any x^ > 0 

r-xo /•» 

(5.5.11) K > fi!f x,) = x2 dF,(x) > xl dF,{x) + x? dF,ix), 

J—ao J — J Xf) 

Therefore, we may write 

(5.5.12) > FJ-xJ + 1 - F„(x„), n = I, 2, . . . 

For a given e > 0, we can therefore choose x, > 0 so that 1 — [F„(x) — 
< e for X > Xo and for all n. A subsequence of (x^, X 2 ,.. .) can 
be found, as pointed out in the proof of 5.4.1, whose c.d.f.’s converge 
to a nondecreasing function G(x) at all of its points of continuity. Then 
clearly for x > x^ we have 1 — [G(x) — G(—x)] < £, which implies that 
C7(- 00 ) = 0 and G(+ 00 ) = 1. Therefore G(x) is a c.d.f. which completes 
the argum^t that a subsequence of (x^, Xg,.. .) converges in distribution 
to some random variable x. 

Now suppose we have a complete moment-sequence = /^r,n» 

r = 0, 1,2,... for each component x„ in the sequence of random variables 
(a^i, X 2 ,...) such that //' are all finite and lim ^ = //', r = 0, 1, 2,... 

where the limit-sequence yw', r = 0, 1, 2,... uniquely determines a 



128 


MATHEMATICAL STATISTICS 


c.d.f. F{x), What conditions will guarantee that the sequence of random 
variables ajg,...) converges in distribution to a random variable x 
having c.d.f. identical with F(x)l 

An answer to this question due to Kendall and Rao (1950) can be 
stated as follows: 


5.5,3 Let (xj, ^ 2 , ...) be a sequence of random variables. Let the rth 
moment of x^ be ^ and finite for all n and r. Let lim = 

n->oo 

where (jl^ is finite for all r. Then if x ^,...) converges in 
distribution to F{x\ ... is the moment-sequence of F{x). 

Conversely, if this moment-sequence uniquely determines a c.d.f F(x), 
it is the limiting c.d.f. of (xj, . ..). 

To establish 5.5.3 we first assume that (ajj, a: 2 ,...) converges in distribu¬ 
tion to a c.d.f. F(x) and we must then show that //' = lim r = 1,2, 

n~*co 

.. . , is the moment-sequence of F(x). That is, if FJix) is the c.d.f. of 
= 1, 2,. . ., we must show that 


(5.5.13) 


lim I x‘^ dFn{x) — j x^ dF{x) 

n— J — ao J — ao 


= 0 , 


Note that for any K >0 


for /• = 1,2. 


where 


(5.5.14) 


*’• dFJx) — f *’■ dF(x) < Ai + A2 + A3, 

— 00 J—00 


At = I X' dFJx) - x" dF{x) 


A, = \\ x<- dF„(x) I 


= J 


dF(x) 


and where Eg is the set of values of x for which |*| > K. It follows from 
Schwarz’ inequality that 


(5.5.15) 



both integrals being non-negative. Since ^gr.n Fzr (finite) as n -► 00 , 
there exists a constant > 0 which bounds the first integral for all 
n and K. Since F„(ar) -> F(x) as n -*• ex), the second integral on the right, 
and henpe A, can be made arbitrarily small for all n by choosing K 
sufficient'^ large. 
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Since is finite, can be made arbitrarily small by choosing K 
sufficiently large. 

Since Fn(x) F(x) as n-> oo and both ^ and ju^ are finite, it is 
evident that for any fixed K, can be made arbitrarily small by choosing 
n sufficiently large. 

Therefore we conclude that (5.5.13) holds and hence that lim /u^. ^ =/^', 

n-*-oo 

r = 1, 2, . . . ; fj> 2 , • • . being the moment-sequence of F(x), the limit 

of the sequence of c.d.f.’s of (a;i, rcg,... ). 

Now consider the converse statement in 5.5,3. We assume that 
/Xg,... uniquely determine a c.d.f. F(x) and we must show that Fi(x), 
F^{x), . . . converge to a c.d.f. which is F(x). We know from 5.5.2 that 
every convergent subsequence of F^{x\ F^ix), .. . converges to some 
c.d.f. and from the argument given in the first part of the present theorem 
we know that the limiting c.d.f.’s for these subsequences must all have the 
same moment-sequence, namely, • • • • moment-sequence 

is assumed to determine a c.d.f. uniquely. Therefore these limiting c.d.f.’s 
are all identical to F(x)^ namely, the c.d.f. having the moment-sequence 
/^2» • • • • 

Finally, it should be noted that the condition that ^ be finite for all 
n and r can be replaced by the condition that ^ be finite for all r and all 
n greater than some integer ai* possibly depending on r without affecting 
the conclusions of 5.5J. 

PROBLEMS 

5.1 If a random variable x has a p.d.f. (or p.f.) which is symmetric about the 
vertical axis, show that the characteristic function (p(j) of x takes on only real 
values. 

5.2 A random variable x has characteristic function 


Show that the p.d.f. of x is for any value of a; in i?j. 

5.3 The characteristic function of a random variable x is 

g»t(l - 

/i(l “ e'O 

Show that a; is a discrete random variable with p.f. p{x) = l//i for a; s= 1,..., n. 

5.4 If is the characteristic function of a random variable a;, show that 
the p.d.f. of X is 

5.5 Prove 5.2.1. 

5.6 If a?!,..., arc independent random variables whose c.d.f.’s are all 
F(x) show that the characteristic function of a^^ -f • • * + a;*, is [(fit)]^ where qf(t) 
is the characteristic function of a random variable x having c.d.f. F(x). 
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5.7 A random variable x is said to have a Cauchy (1853) distribution if its 
p.d.f. is 

“ n[l^ +(X- ’ 

for — 00 < a; <+ 00 , where k and are real constants and A: > 0. Show that 
the characteristic function of x is given by 

q>(j) = 


and hence that if ..., are independent random variables all having this 
Cauchy distribution, the random variable (a?! + • • • + also has the same 
Cauchy distribution. 


5.8 If a?!,..., are independent random variables all having the same 
p.d.f. in i?i, namely, —^ show, by using characteristic functions, that 

V 277 


the random variable (a;^ + 

V/i . 


+ x^ also has the same p.d.f. 


5.9 Suppose a; is a random variable denoting the number of times a die must 
be thrown in order to obtain one ace. Determine the probability generating 
function 0{t) of x. Find the mean and variance of x from d{t), 

5.10 Suppose a; is a random variable denoting the number of times a “true” 
coin must be thrown in order to get k heads. If F{x ; k) is the c.d.f. of x show that 
F(x\ k) is reproductive with respect to k. 


5.11 If a; is a random variable denoting the number of spades in a hand of 13 
ordinary playing cards, show that the probability generating function 0{t) of x is 

the coefficient of in (1 + tv)^^ (1 + |. From 0(0 find the mean and 

variance of a;. / 


5.12 A random variable x has 

k 




^ +r’ 


r = 1,2,. 


where k > 0. Show that the p.d.f. of x is given by 

f{x) = kx^-^ for 0 < a; < 1 

= 0 otherwise 

and that the distribution of x is unique. 

5.13 If a; is a random variable having as its rth moment. 


(jk + rV 

^, k being a positive integer. 


show that its p.d.f. is 


/W 


a? < 0 


and that the distribution of x is unique. 


a; > 0 
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5.14 If X 2 ) is a bounded two-dimensional random variable such that for 

r, 5 =» 1, 2,..., all exist, and ^(xlxp « ^(a;J) • show 

that x^ and X 2 are independent. 

5.15 Let a: be a random variable denoting the number of dots which appear 

when n “true” dice are thrown simultaneously. Show that the moment 
generating function v(i) of x is (ey6)\\ — and that the distri¬ 

bution of X is reproductive with respect to n. 

5.16 In a sequence of random variables ajg,...), has p.d.f. /f(l — 

on (0,1) and 0 otherwise. Let 2 /^ = /i x^. Show, by using characteristic functions, 
that the sequence ( 2 / 1 , 2/2 • • •) converges in distribution to a random variable y 
having p.d.f. e~^ on (0, cao), and 0 otherwise. 

5.17 In a sequence of random variables ccg,...), x^ has characteristic 
function 


Show that the p.d.f. of x^ is 1/2 h on (—w, + n) and 0 elsewhere, and hence that 
even though the sequence of characteristic functions converges to a limit 9)(0» the 
sequence of c.d.f.’s does not converge to a c.d.f. What condition of 5.4.1 is 
violated? [Cramer (1946)]. 

5.18 If (xi ,..., are independent discrete random variables such that the 
p.f. of Xi is 

*=0,1.2. in>0. 


and if z is the random variable cci + • • * + show by using characteristic 
functions that the p.f. of z is given by 





z = 0, 1, 2,... 


5.19 A sequence of non-negative random variables (a?!, a; 2 ,...) is such that 
the rth moment of x^ is given by 

, r! rf(n — r— 1) ! 

(n - 1)! ’ 

/I > r = 1, 2,_ Show that the sequence of random variables converges in 

distribution to a random variable x having p.d.f. 



a; < 0 

a; > 0. 


Show that this limiting distribution is uniquely determined by its moments. 

5.20 Consider a stochastic branching process^ where an object can generate 
other objects, each of which can generate others, and so on, such as occurs for 
males (or females) in successive generations, or for neutrons in a nuclear fission 
process. 

Let be a random variable denoting the number of objects an initial (zeroth- 
generation) object produces. Let a^g be a random variable denoting the number 
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of objects each jfirst generation object produces. In general let be a random 
variable denoting the number of objects each nth generation object produces. 
Assume the p.f.’s of • • • are all identical, namely, p(x\ a: « 0, 1, 2,.... 
Let be a random variable denoting the total number of objects produced in 

the nth generation. Let 6^(0 be the probability generating function of yn- [Note 

00 

that ©o(0 t and O^it) = 2 Assuming all probability generating 

aj=0 

functions converge, show that 

<?n+i(0 * e^lSMl 

If we denote by and the mean and variance of t/i, show that 

^(y«) = 

o*(j/fl) = if = 1 

= — !)/(/* — 1)> if 1. [Harris (1948)] 

5.21 {Continuation). If the random variables . x„ have different 

p.f.’s, say p 2 {x\ ..., pn(x) and corresponding probability generating 
Inunctions (/), 0*• • •» respectively. Show that the probability generating 
function bJj) of is given by 

ut) = * * no) • • •}]. 

and hence that 

*(yn) = A*1 • • • ftfi, 

where ..., are the means of x^,... x„, respectively. 

5J2 Establish (S.4.4). 

5.23 Suppose x is a random variable with p.f. p{x), and c.d.f. F(x), and 

00 

whose sample space is a set of non-negative integers. Let d*(t) = ^ 

aj=0 

iht generating function for F{x), If 0(0 is the probability generating function of 
sc, show that for |r| < 1 

^*(0 » ^0/(1 - 0. 

5.24 If .are independent random variables whose sample spaces 

are non-negative integers and whose probability generating functions are 
^i(0» • • *» ^fcCOt show that the probability generating function of + * * * + 

is ©1(0 • • • e^t). 



CHAPTER 6 


Some special Discrete Distributions 


The purpose of this chapter is to present some of the more important 
discrete probability distributions of mathematical statistics, not only to 
provide certain basic information about the distributions themselves but 
to illustrate some of the concepts, principles, and methods of Chapters 
1 through 5 more fully than was done by the examples in those chapters. 
The distributions discussed in this chapter, and some of their main pro¬ 
perties, will also be used at various points throughout the remainder of 
the book. Moreover, their discussion will show that, in spite of the general 
principles and methods introduced in earlier chapters, the study of special 
distributions often requires special methods and devices. 


6.1 THE HYPERGEOMETRIC DISTRIBUTION 


(a) The Case of One Random Variable 


The probability function p{x) used at the end of Section 2.3(a) is a 
simple case of a hypergeometric distribution. More generally, suppose we 
have a collection IT of elements such that each element belongs either to 
class C or to class C. Let Np be the number of elements belonging to C 
and Nq the number belonging to C where p + q = 1. Suppose a set of 
n( < N) elements is taken from 11 and we let x be a random variable 
denoting the number of C’s in the set. We wish to find the p.f. of x. 

There are 


possible sets of n elements in 11, of which ^ 


will contain exactly x C’s. 


It should be noted that for this situation the basic sample space R discussed 

in Chapter 1 consists of sample points, each sample point being a set of /i 

elements from IT. The event for which x consists of all sample points in J? 
for which the number of C’s is < x\ 
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Assigning equal probabilities to all sample points in R, we have for the 
p.f. of X the following: 


( 6 . 1 . 1 ) 


p(x) = 


e) 


the mass points of x (sample space of x) in Ri being the integers satisfying 
both inequalities 0 < x < Np and 0 < « — a: < Nq. 

This is called the hypergeometric probability function. It will be con¬ 
venient to refer to a distribution having this p.f. as the hypergeometric 
distribution H(N, n; p). 

To see that the sum of p(x) over this range of values of x is unity consider 
the identity 

(u -f- s (w + vYiu + vY 


where A and B are positive integers. Expanding both sides of this 
expression, we have 




Since this is an identity in u and v, the coefficient of on the left 

must be identical with the coefficient of on the right. Therefore 


( 6 . 1 . 2 ) 




where 2 denotes summation over the integers s, satisfying both inequalities 

9 

0 < s < A and 0 < r — s < B. 

It is now clear from (6.1.2) that 


(6.1.2a) 




and hence the sum o{ p{x) over the sample space of x stated in (6.1.1) is 
unity. 

This hypergeometric distribution is an example of a distribution for 
which the characteristic function is virtually worthless as a device for 
finding moments. But we can determine the factorial moments in a 
reasonably painless fashion as follows. 

First, we note that 


(6.1.3) 

the range of x being identically the same as that in (6.1.2a). 
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By (6.1.2) the sum on the right has the value ( When we 

/N\ ' ” ^' 

divide (6.1.3) by I I, the expression on the left defines and its value is 

therefore given by 


(6.1.4) = 

In particular, we have 



/^[i] == np, = 


np(n - l)iNp - 1) 
(iV- 1) 


and the following mean and variance 
(6.1.5) ju(x) = np, a^(x) = Z 


The hypergeometric distribution is one of the most important distri¬ 
butions in elementary probability theory. It is basic in connection with 
the evaluation of probabilities in problems arising in most card games, in 
drawing samples from lots of mass-produced articles containing defectives 
and other finite populations, etc. 

The p.f. and the c.d.f. of the hypergeometric distribution (6.1.1) have 
been extensively tabulated by Lieberman and Owen (1961). 


(b) The A:-Variate Case 

Probability distribution (6.1.1) can be generalized to k random variables. 

The example at the end of Section 2.5(a) is a special case of a two- 
dimensional or bivariate hypergeometric distribution. More generally, 
suppose n is a collection of N elements, such that Np^ elements belong to 

class 1,..., A: -h 1, where px + -h pj,+x = 1- This means that 

the classes Q,..., are mutualiy exclusive, that is, every element of 11 
belongs to one and only one of these classes. Now suppose a set of n 
elements is taken from II. Let be random variables denoting 

the number of elements in the set belonging to Q,..., Q+i, respectively. 
These random variables are linearly dependent since their sum is n. We 
may consider x^,... linearly independent random variables, and 

write Xj^x as 71 — iTi — • • • — a;*,. 

There are (^) possible sample points of n elements which could be 
formed from 11, that is, I 1 points in the basic sample space R. Of 
this number of sample points, exactly ’ ‘ ’ 
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a?! CiS ^..., Cj^iS. Assigning all of these j sample points equal 
probabilities, we obtain the following p.f. of ... ,Xj^: 


( 6 . 1 . 6 ) 



where = n — Xy^ — • • • —Xj^. This is the k-dimensional or k-variate 
hypergeometric probability function. The mass points (sample points) of 
this distribution consist of those points in Rj^ with coordinates which are 

k 

positive integers or zero in the simplex x^> 0, z = 1,..., fc, 2 < /i 


for which all the inequalities, 0 < < Npi, / = 1,..., + 1, are satisfied. 

It will be convenient to refer to a probability distribution having p.f. 
(6.1.6) as the k-variate hypergeometric distribution H{N, n;pi,. ., ,pj^. 
The means, variances, covariances, and higher moments of (6.1.6) can 
be determined from factorial moments. Denoting the factorial moment 
• • • a:^***!) by we find by a straightforward extension of the 

method used in arriving at (6.1.4) 


(6.1.7) .. . [r*] — 






where 0 < r, < Np^, i = I,... ,k, and + • • • + r* < «. From (6.1.7) 
it can be verified that 


( 6 . 1 . 8 ) 


M*<) = "ft. A^i) = «ft(i - ft)( j^ _ ” ). 

cov (x<, Xf) = ~ . 


6.2 THE BINOMIAL DISTRIBUTION 

Suppose an “operation” or “trial,” when performed, will have one of 
two possible outcomes: C or C. For instance, in rolling a die we might 
denote by C the occurrence of an ace, and by C the occurrence of any 
other face. Suppose p is the probability of a C and q the probability of 
a C. Then p + q = 1. 

If the “trial” is performed n times, we will obtain a sequence of n letters 
consisting of C’s and C’s. There are 2" possible sequences; these se¬ 
quences form the basic sample space R, the sample points themselves 
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being these 2~ possible sequences. Let a; be a random variable whoit 
value for any given sequence is equal to the number of C’s in that sequence. 
If we assume that the n “trials” in the sequence are statistically independent, 
the probability which would be assigned to any sequence for which 
a; = a;' is 

(6.2.1) 

as may be seen by replacing Chy p and C by 9 in the sequence and multi¬ 
plying the / 7 ’s and q's in the resulting sequence. There are sequences 
for which x = x\ and hence we obtain ' ' 

(6.2.2) P(* = X') = *' = 0, 1, .. ., n. 

Hence a; is a discrete random variable whose p.f. p{^ is given by the 
right-hand side of (6.2.2) (dropping dashes). It will be noted that /?(a;) is 
the general term in the expansion of the binomial {q + pf^ which means 
that the sum of /?(a;) given by (6.2.2) over the sample space of the random 
variable a; is (^ -f /?)”, which, of course, is unity since p + q ^ 1. Ac¬ 
cordingly, a distribution having p.f. (6.2.2) is called the binomial distrh 
bution and will be referred to as Bi{n, p), (We use Bi{a, b) for binomial 
distribution notation and Be{a, b) for beta distribution notation to be 
introduced in Chapter 7.) Sequences of independent trials, each resulting 
in one of two possible outcomes, with constant probabilities from trial 
to trial are sometimes called Bernoulli trials after J. Bernoulli (1713) 
who first studied them. 

The characteristic function of (6.2.2) is 

(6.2.3) cpit) = Ae''*) = i (")«'= (9 + pe-r, 

from which the moments //' can be found by applying (5.1.5). In particu¬ 
lar, we have 

(6.2.4) n'l = = np; fi^ = = np{q + np) 

from which we find the mean and variance of (6.2.2) to be 

(6.2.5) p{x) = np, a\x) = npq. 

If we consider « as a parameter in (6.2.3) and write (p(t) as (p{t\ n), then 
(p{t\ ri) satisfies (5.3.7), which means that if x^ and x^ are two independent 
random variables having binomial distributions Bi{ni,p) and Biin^^p) 
respectively, then x^ x^ + x^ has the binomial distribution Biin^^ -f- n 2 ,p\ 
an intuitively obvious result. Hence 

6.2.1 The binomial distribution Bi{n, p) is reproductive with respect to n. 



138 


MATHEMATICAL STATISTICS 


It should be pointed out that the binomial distribution (6.2.2) is a 
limiting form of the hypergeometric distribution (6.1.1) as N-*- oo. For 
(6.1.1) may be expressed as 

(6.2.6) K*) = 




H 

” N )“{“ N> 

)■■■( 

N J 

1 

-_w 


SI 

1 

1 

-1 

N , 

j iV-i 



""V- 


Hence, holding x and n fixed we have the result that; 

6.2.2 The limit of the pf. of the hypergeometric distribution H(N,n;p) 
as N -*■ CO is the pf. of the binomial distribution Bi{n,p). 


Since the numbers of mass points in the binomial and hypergeometric 
distributions are finite, 6.2.2 implies that if Pg(x e E) and P^x e E) are 
probabilities of a fixed event E computed from H(N, n;p) and Bi{n‘,p) 
respectively, then lim Pg(x eE) — P^x e E). Expressed in terms of 

sequences of random variables: if x^ is a random variable having the 
hypergeometric distribution H{N,n;p), then the sequence of random 
variables (x^, Xj,...) converges in distribution to the random variable x 
having the binomial distribution Bi(n,p). 

The binomial distribution is one of the most important distributions in 
statistics. Both the p.f. (6.2.2) and the c.d.f. form have been widely 
tabulated, the most extensive tabulations being those by Army Ordnance 
Corps (1952), Harvard Computation Laboratory (1955), National 
Bureau of Standards (1949), and by Romig (1953). 


6.3 THE MULTINOMIAL DISTRIBUTION 

Now suppose each “operation” or “trial” will result in one and only 
one of mutually exclusive events Q,..., and let the probabilities 
corresponding to these events be Pi,... all > 0, and 
Let n independent trials be made and let (x^,..., x^]) be a (A; + !)• 
dimensional random variable whose components x^,..., x^^^ denote the 
numbers of trials resulting in Cj,..., € 1 ,^. 1 , respectively. The components 
Xj,..., x^x are linearly dependent since x^ + • • • + x^^x — The basic 
sample space R consists of {k + 1)” sample point& each point being a 
sequence of n selections from Ci, ..., C^^-i repetitions allowed. The 
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probability associated with any sample point consisting of Q’s,..., 
**+i C;h-i’s 'S 

( 6 . 3 . 1 ) pp • • • 


The number of such sample points in Jt is evidently 


(6.3.2) 


nl 


x^! 


‘-ifc+i- 


Since the amount of probability at each of these sample points is given by 
(6.3.1), it follows that the p.f. of the random variable (x^,... ,x^is given 
by 

(6.3.3) p(x„ ...,x,) = 

xj! • • • x*+i! 


where it is to be understood that — • * • — and 

= 1 — — • • • — It will be noted that the expression for/?(a:i,..., 

is the general term in the expansion of the multinomial (/?i + • * * + /?*+i)". 
Accordingly, a distribution having p.f. (6.3.3) is known as a k-variate. 
multinomial distribution, which will be conveniently referred to as 
M(n\ The mass points of this distribution, that is, the sample 

space of (a?!,. .., a;^), are the lattice points in contained in the simplex 

k 

a;^ > 0, / = 1 ,..., A:, 2 < w, that is, all points in the simplex whose 

1=1 

coordinates are positive integers or zero. 

The characteristic function of the multinomial distribution is 


= 2 • • • (p.e'“)*‘(Pt+i)**^' 

where the summation is performed over all points in the sample space of 
(x ^,.. . , a;^). But this is merely the sum of all of the terms in the expansion 
of the multinomial + • • • + Therefore, 

(6.3.4) ..., 4) = (/»ie''i +-h + p^+i)”. 

By applying (5.2.3) one can find the joint moments of Xj,..., x^. In 
particular, we find 

(6.3.5) p(x..) = npi, a^ix,) = ^^^(l - p<) 

cov (x,., x^) = -npiPj. 

If n is considered as a parameter in (6.3.4), it can be seen that 

6.3.1 The multinomial distribution M{n\pi,...,p^ is reproductive with 
respect to n. 
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6.4 THE POISSON DISTRIBUTION 


In the binomial distribution (6.2.2) suppose, for a fixed x, we let n -► oo 
and p-^0 through sequences of values such that np = ft, for a fixed p. 
Then the limit of (6.2.2) is the following probability function 

(6.4.1) p(x) = . 

xl 


This is the p.f. of what is known as the Poisson distribution, named after 
its discoverer Poisson (1837). It will be convenient to denote the Poisson 
distribution having p.f. (6.4.1) by Po(jji). [We insert o in Po{jji) to avoid 
confusion with probability notation P(jji).] Since n is allowed to increase 
without limit in obtaining (6.4.1), we can fix x at any positive integer 

desired. Thus, the mass points of a? are 0,1, 2,_To establish (6.4.1), 

note that the right-hand side of (6.2.2) can be written as_ 


(6.4.2) 



1 

IN> 

... 1 

fi 

nV 


; n) 

\ 

n 1 


x\ 



Allowing « “► 00 and /? -> 0 so that np = p, we see that the limit of (6.4.2) 
is the distribution given in (6.4.1). It can be readily verified that the sum 
of the probabilities given by (6.4.1) for a; = 0, 1, 2,... is unity. 

The characteristic function of the Poisson distribution is 


(6.4.3) 
from which one finds 

(6.4.4) 


00 . 

y(t) = 2 

«=0 x\ 




p{x) = p, g\x) = p. 


Considering as a parameter in the characteristic function (6.4.3) it can 
be seen that the characteristic function satisfies (5.3.7) and hence 


6.4.1 The Poisson distribution Po(ji) is reproductive with respect to p. 

Remark. Since the binomial distribution is an approximation to the hyper¬ 
geometric distribution for large N, and the Poisson distribution is an approxi¬ 
mation to the binomial distribution for large n and small p, one might ex^t the 
relatively simple Poisson distribution to an approximation to the hypergeo¬ 
metric distribution under certain conditions. As a matter of fact, roughly 
speaking, this is the case if (i) n is large, (ii) p small (with np — n), and (iii) N\% 
much larger than it. These conditions are reasonably well fulfilled in such 
important applications as the sampling of lots of mass-produced articles, in 
which case A/^is the size of the lot, it the size of a sample of articles drawn from the 
lot, and p is the fraction of defective articles in the lot. 
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Like the binomial distribution, the Poisson distribution is a very important 
one in applications of probability and statistics. There are many problems not 
only in the distribution of numbers of defectives in samples of industrial products 
but in the distribution of bacterial colonies per unit volume of a culture or liquid, 
number of telephone calls initiated per unit time, number of bomb fragments 
striking a target per unit area, etc., in which the Poisson distribution arises. In 
all these situations we essentially have a large number n of independent or 
nearly independent “trials,” a small probability /? of a “success” in any given 
trial, and we ask for the probability of getting x “successes” in n trials. 

The p.f. and c.d.f. of the Poisson distribution have been extensively 
tabulated by Molina (1942). 

6.5 DISCRETE WAITING-TIME DISTRIBUTIONS 
(a) The Hypergeometric Case 

Suppose n is a set of elements consisting of Np C’s and Nq C's where 
/? > 0, gr > 0, and p + q = 1. Let elements be drawn successively until 
exactly k C’s are drawn. We wish to determine the probability that 
exactly x elements will have to be drawn to accomplish this objective. 
The event points of our basic sample space R will consist of all possible 
sequences of C’s and C’s which could be produced by successively drawing 
all N elements from 11 until 11 is exhausted. There are 


(np) 

sample points in R. 

For each sample point, the value of our random variable x is equal to 
the number of elements drawn in order to obtain exactly k C’s. For any 
value z' satisfying both inequalities 0<A:—l<a:' — 1 and 0 < Np 
— k <, N — x', the number of sample points for which a: = a:' is seen to be 


(6.5.2) 


(x' - 1 

\(N-x'\ 

U - 1 

)\Np-k) 


by combinatorial analysis. If all sample points in R are assigned equal 
probabilities, namely, 1 j j^®''® 


(6.5.3) 


(x' - l\(N-x'\ 

r(x U-i/Up-fc/ 

a) 


which (dropping dashes) is therefore the p.{.p(x) of the random variable *. 
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The distribution having (6.5.3) as its p.f. will be called the hypergeometric 
waiting-time distribution. A specified value *' of the random variable * is 
essentially the number of trials we have to wait through in order to obtain 
kCs. 

To verify that 
(6.5.4) 

is a p.f., where the sample space of x is the set of integers k,k + 1. 

Nq + k, \ <k < Np, we must show that 

Note that for |r| < 1, we have 


(656, 

and 

(6.5.7) (1 - ")*'• 

It will be seen that the coefficient ofin the expansion of (1 — r)-(^J’+i) 

given by (6.5.7) is (^ )• By multiplying the two equations in (6.5.6) 
we obtain 

(6.5.8) (1 - r)-<»««' - 22 (n 1 ) (wpl 

The coefficient of in the expansion of (1 — given by (6.5.8) 

is the sum of the coefficients — ^) possible pairs 

of integers (x, y) for which x = y, that is, by 


(6.5.6) 


(6.5.7) 


(6.5.9) 


- l\ / N - X \ 
.fi \k-l}\Np-kj- 


Since one expansion of (1 — r)-<^J’+^> gives as the coefficient of 

whereas the other gives (6.5.9), we therefore obtain (6.5.5), and hence 
p(x) as defined in (6.5.4) is a p.f. 

The distribution whose p.f. is given by (6.5.4) is another example of a 
discrete distribution whose characteristic function is not useful for finding 
moments. But the moments can be found by a method similar to that 
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used in Section 6.1. In this case it is perhaps simpler to evaluate 
^[(a: — the rth factorial moment of x with respect to k, and then find 

the ordinary moments from these factorial moments. We have 


(6.5.10) S'lix - fc)W] 

If we use the fact that the sum of the numerator on the right of (6.5.4) 

over the sample space of x has the value 1/^1 it is evident that the sum 

/ N \ 

inside ( ) has the value ( ., . |. 

^ •’ \Np + r) 

(6.5.11) ^[(x - fe)t’'l] = (k + r - 


Therefore we obtain 


N 
Np + 




From this expression for r = 1, 2 one finds the mean of x to be 


(6.5.12) 

and the variance to be 


p(x) = 


k(N + 1) 

Np + 1 


(6.5.13) 


^ ^ Nq(Nq - \)k(k + 1) k(2k + l)(jV + 1) 

^ ’ (Np + 2){Np +1) Np+1 


k\N 4- 1)^ 
{Np + 1)^ 


- k(k + 1). 


(b) The Binomial Case 

Suppose “trials” are performed successively and the outcome of each 
trial is either a C or a C, the trials being independent. Let P(C) = p and 
P(C) = q where p + q = 1. Our basic sample space R is the set of all 
possible sequences (sample points) of C’s and C's which could occur in 
an indefinitely long sequence of trials. Let a: be a random variable defined 
for each sample point as the number of trials performed in order to obtain 
exactly k C’s. 

Let E be the event in R of getting A: — 1 C’s in the first a; — 1 trials and 
F the event of getting a C on the a:th trial. If G is the event of having to 
make exactly x trials to get exactly k C’s, then C = £* n F, and F(G) 
= P(£ n io = P{E) • P(F I E). But F(F \E)=p and 

(6.5.14) P(£)=(*-J)/-V-*. 
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Hence, if />(«) is the p.f. of x, we have p(x) = P(Cr), that is, 

(6.5.15) P(x) = (* “ }) x = k,k + l . 

Note that the expression for p(x) is the (x — k + l)th term obtained in the 
expression — qY^ if (1 — qY^ is expanded into a series in powers of q. 
The sum of p{x) over the values a; = fc, fc + 1,... isp*(l ~ = p^p''^ 

= 1 . 

The distribution having the p.f. defined by (6.5.15) will be called the 
binomial waiting-time distribution. The p.f. p{x) is simply the probability 
that one must wait through x (independent) trials in order to obtain k C’s. 
The distribution is also called the negative binomial distribution and the 
Pascal distribution. 

The characteristic function of (6.5.15) is 

9>(0 = {l I - qe“y’‘ 

which reduces to 

(6.5.16) (p(t) = p\e-it - qY^^ 

By the usual differentiation procedure it can be verified that 

(6.5.17) K^) = -, = 

P P 

Note that the mean and variance here are the limiting forms of those in 
(6.5.12) and (6.5.13), sls N co. In fact, the p.f. of the binomial waiting¬ 
time distribution (6.5.15) is, as one would expect, the limiting form of the 
p.f. of the hypergeometric waiting-time distribution (6.5.3) as N->oo. 
The reader will find it instructive to verify these statements. 

We have considered only the simplest kind of a discrete waiting-time 
distribution. Problems arise in which we wish to know not only the 
probability function of the number of trials required to obtain a specified 
number of C’s but the probability function of the number of trials required 
to obtain specified numbers of C’s and of C’s. Or in the case of more than 
two classes of outcomes, we may wish to know the probability function of 
the number of trials required to obtain specified numbers in each class. 
Problems of this type are much more complicated than those we have 
discussed. Some of them have been treated by Girshick, Mosteller, and 
Savage (1946), Haldane (1945), Laplace (1814), and McCarthy (1947). 

6.6 DISTRIBUTIONS IN THE THEORY OF RUNS 

The theory of runs plays an important role in certain problems of 
nonparametric statistical inference as we shall see in Chapter 14, But 
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the main distributions in the theory of runs can be conveniently introduced 
here. The results which will be presented below are due primarily to 
Mood ( 1940 ). 


(a) Runs of Two Kinds of Elements 


Consider a sample space R whose sample points consist of the set of all 



permutations of C’s and C’s, where ^1 + ^2 = n. 


Any parti¬ 


cular sample point will be a sequence of C’s and C's which will consist of 
alternating runs of C’s and C’s. The length of a run is the number of 
elements in it. Let r^j denote the number of runs of C’s of lengthy, and r2^ 
the number of runs of C’s of length J. For instance, if the sequence is 


cccccccccccccC 


we have Wj = 8 , Wg = = K '*12 = 2 , = 1 , r2i = 2 , r22 = 2 , all 

other r’s being zero. 

It will be seen from the definition of Wg, r^ and rg, that 

and ^Jr2j = ^2- Let ^ ''u '*2 = 2 ^23 iLe total numbers of 

J _ j j 

runs of C’s and of C’s respectively. For any assignment of probabilities 

to the sample points in R we can set up random variables and r2i which 
will have specific values as defined above for any given sample point in R. 
Our problem is to find the p.f. of these random variables when all sample 
points in R are assigned equal probabilities. For a given set of values of 
the the number of ways of arranging the r^ runs of C’s is 


( 6 . 6 . 1 ) 






and similarly, the number of ways of arranging the runs of C"s is 

( 6 . 6 . 2 ) 


21 * ' ’ ’ 


roil 


Note that and r2 cannot differ from each other by more than unity; 
for if they do, at least two runs of one kind of element would have to be 
adjacent, contrary to the definition of a run. If Ki = r2, a given arrange¬ 
ment of runs of C’s can be fitted into a given arrangement of runs of C’s 
in two ways, such that the entire sequence of C’s and C’s will begin with 
either a run of C’s or a run of C’s. 

If we define the function yir^, to be the number of ways of arranging 
indistinguishable objects of one kind and r2 indistinguishable objects of 
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a second kind so that no two objects of the same kind appear toother, 
then we have 


(6.6.3) 


y(ri, fj) = 


^0 

1 

2 


ki - ''al > 1 
ki - '•al = 1 


The total number of ways of getting rif runs of C’s of lengths J = 
1 ,...,and of getting r^j runs of C's of lengths j = 1 ,..., /tj is therefore 


(6.6.4) 


But there are 



y(ri, 

'■ii! • • • »-2i! • • • r2„,! 

possible arrangements of C’s and M 2 that is. 


sample points in R. If these arrangements are all assigned equal 
probabilities, then the (m^ + M 2 )-dimensional random variable 
1 . iti, i — 1,2) will have the following p.f.: 


(6.6.5) 


'•u! 


ri„,!r2x!--T2„.! m! 


If we arc interested only in the p.f. of the runs of C’s, that is the we 
must take the marginal distribution of ..., in (6.6.5), that is, we 
must sum J) with respect to rgi,..., This means that we must 

sum (6.6.2) for all r^i ,..., such that 2 j^ 2 j = ^2 2 '* 2 ? = '' 2 - 

j 3 

order to do this we make use of the following identity in s which holds 
for values of s near zero: 

(6.6.6) (s + s® + •••)'* = s'%1 - s)-'^ s s'* I s‘. 

t=0 (rg - 1)! t\ 

Now the coefficient of j”* in the first expression of (6.6.6) is the sum of 

(6.6.2) with respect to the rgi,..., subject to the restrictions J = ^2 

“ 3 

and = rg. But the coefficient of from the first expression in (6.6.6) 
3 

must equal that of 5 ^* in the last expression in (6.6.6), which is 


(6.6.7) - ~ -. 

(r, - 1)! (n2 - r,)! 

Therefore the p.f. of (rij,y = 1 ,..., Mj) and is 


(6.6.8) p({ri,}, r,) = —-- 

'•ji! • • • rin.! 


(«2 - 1 )! 

(r2-l)!(M2-r2)! 


Ml! M 2 ! 

n! 


Virx, rj). 
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Finally, to obtain the p.f. of the we must sum (6.6.8) with respect to 
rg. Making use of (6.6.3), we have, after some simplification, 

(6.6.9) 


I _ - 1)! 


Hr.,r.)- 


rj = i(r2 — l)!(n2 — rg)! ' ' r,! (n. —''i + 1 )! 

Therefore, we have the following result for the p.f. of (ri^,y = 1,..., «i) 

A similar result holds, of course, for the p.f. of the 

Another important distribution is that of and rg. The p.f. of this 
distribution is obtained by summing (6.6.8) with respect to the subject 
to the conditions ^ = //^ and ^ = r^. The procedure here is similar 

j J ^ 

to that used in summing (6.6.5) to obtain (6.6.8) and it yields 


( 6 . 6 . 11 ) 


. i), . 

1. '••>) = —^-—r- 

(:,) 


The p.f. of Tp the total number of runs of C’s, is found (by summing 
(6.6.11) with respect to rj) to t’o 


( 6 . 6 . 12 ) 


P(ri) = 

I 


i'h - 1 

W«,. + 1\ 

■'•l -1 

M ri / 


(;) 


A similar expression holds for the p.f. of r^. 

To verify that the sum of p{rj) over the sample space of is unity we 
make use of the relation 

obtained by equating the coefficients of in the following identity 

(6.6.14) (1+ s)-'(l+-J' = 

It follows from (6.6.13) that 


(1 -f s) 


A + Ji 
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which is equivalent to the statement 

2 Kri) = 1 

ri*l 

The easiest way to find the moments of is by means of factorial 
moments. For the ^th factorial moment we have 

(6.6.15) A, - ^(,1-.)=(;■:;) ("•+27 *). 

But it follows from (6.6.13) that 


Therefore 

(6.6.16) 


("I*) 

Ai - ("i + i)**' X ■ 
(;) 


from which we find 
(6.6.17) ju(ri) = , or^(ri) = . 

n n(ny-‘‘ 


Similar formulas exist for the factorial moment, the mean and variance 
of Tj. 

By similar methods applied to (6.6.11) one can find that the general 
factorial moment of (r^ — 1) and — 1) is given by 


(6.6.18) 

#[(ri - l)t'‘l(r* - l)t'‘j] 


^ (rii - 1 )^‘’‘^(W 2 - l)^"*^ (n- gi- ^ 
^ M j \ ~ / 


In certain kinds of problems, the random variable u = + r^, the 

total number of runs, is important. 

The mean and variance of u can be obtained at once from (6.6.17) and 
(6.6.18) by using formulas (3.4.2) and (3.4.3). We have 


(6.6.19) 


ix{u) = ju(ri) + (i{r^ = + 1 

n 


A.) - At,) ^■Ari^^ cov (r„ r,) - 

n\n — 1) 
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As a matter of fact the p.f. of w can be found from (6.6.11) by summing 
'‘ 2 ) along the line u ^ in the r^rg-plane. It will be seen that if 

u is even there is only one point at which p{r^^ rg) has a value different from 
zero, and if u is odd there are two such points. The p.f. of u can be written 
down at once as 


( 6 . 6 . 20 ) 


p{u) = 


if u is even 


(.:) 

(:,) ’ 


if u is odd. 


This result was originally obtained by Stevens (1939). Various tabulations 
concerning p(u) have been made by Swed and Eisenhart (1943). 

Throughout the foregoing discussion /ij and have been fixed. If we 
allow them to be random variables, then the distributions whose p.f.’s are 
given by (6.6.5), (6.6.10), (6.6.11), and (6.6.12) are conditional distributions 
f^or fixed values of Hi and /ig. If the p.f. of Ui and is p*(/ii, « 2 )» 
product of each of these four p.f’s by p*(«i, fh^) would give the p.f. of 
the r’s involved in that distribution and n^ and 1 / 2 . To obtain the p.f.of the 
r’s in any case we would have to sum the product for that case over the 
range of values of //j and //o. In particular, if we consider a given set of 
independent trials, each trial resulting in an C or C, then n^ and n^ would 
be linearly dependent random variables satisfying the condition ~ 

n such that would have a binomial distribution Bi{n^ p), where p is the 
probability of getting a C in a single trial. In this case 

( 6 . 6 . 21 ) 

where n^ = /? — n^. To find the glh factorial moment of for instance, 
we would multiply the expression on the right of (6.6.16) by (6.6.21) and 
sum with respect to n^ from 0 to /?, understanding, of course, that the 
terms in this sum corresponding to « = 1,. . ., g -- 1 would be zero. 


(b) Runs of 'Kinds of Elements 

The preceding results can be extended in a reasonably direct manner to 
runs generated by several kinds of elements. Suppose there is a total of n 
elements consisting of n^ Q’s,..., C;,’s where + • • • + = w. We 
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let r,, be a random variable denoting the number of runs of Q of length j. 
Let the total number of runs of C/s. Mood (1940) has shown 

J 

that the p.f. of the set of r,, is 


(6.6.22) p({r,,}) = rn —-: ■ y(ri,.. ., r,), 

n! U=irii! • ■ • r,„'J 


where y(ri,..., is the number of ways objects of one kind, objects 
of a second kind, and so on, can be arranged so that no two adjacent 
objects are of the same kind. The function y{r^,. .. the coefficient 

of in the expansion of 


(6.6.23) (a?! + • • • 4- + ajg • • • + 

(a:i + 0^3 -f • • • + • • • (a^i 4- *^2 -h • • • + a:;^. .i)""~^ 


The argument for establishing (6.6.22) is a straightforward extension of 
that for establishing (6.6.5), and is left to the reader. Similarly, an 
extension of the argument leading from (6.6.5) to (6.6.11) will establish 
the p.f. of (rj,., ., r^fc) as 

(6.6.24) p{{r,}) = ' ^ [n (j' “ J)] • y(r„ . . ., r,). 


Moments can be found from (6.6.22) and (6.6.24) by procedures similar 
to those for the case of k = 2. 


PROBLEMS 

6.1 If a; is a random variable having the hypergeometric distribution 
HiN, n\ p), show that 

6.2 A jar of m n chips are numbered 1, 2,. . ., w + w. A set of /? chips 
are drawn at random from the jar. Show that the probability is 

( m + n — X — \\ //w4'w\ 

/w-1 ;/\ w ) 

that X of the chips drawn have numbers exceeding all numbers on the chips 
remaining in the jar. Also show that 

„w, ;^j/('"+"). 


From this determine the mean and variance of the random variable x. 



SOME SPECIAL DISCRETE DISTRIBUTIONS 


151 


6.3 If (a?!,..., is a ^-dimensional random variable having the A-variate 
hypergeometric distribution H(N^ /i;/?!,... show that the marginal distri¬ 
bution of (a?i,..., ki < A is the hypergeometric distribution 

H(N,n\p^, ... 

6.4 {Continuation) Show that the conditional random variable 


has the one-dimensional hypergeometric distribution 


H 







'^k~\\ 


Pk \ 
Pk Pk->r\/ 


6.5 {Continuation) Show that 4- • • • + Xj^ has the one-dimensional hyper¬ 
geometric distribution H{N, n\pi -h • • • + pt^. 

6.6 Show by formally summing p{xi ,..., as defined in (6.1.6) over the 
entire sample space that the sum is unity. 

6.7 Prove 6.2.1. 

6.8 Prove 6.3.1. 

6.9 Prove 6.4.1. 

6.10 If /I -*• 00 and ,/?A' each -► 0 so that npi = ...» np^ = 

show that the limit of the p.f. of (a-j,. .., Xj^ in (6.3.3) is the product of p.f.’s of 
independent Poisson distributions Po{ni ),. .., Po{nj^). 

6.11 Show that the binomial waiting-time distribution whose p.f. is given by 
(6.5.15) is reproductive with respect to k. 

6.12 If (a:^,. . ., a;;;.) has the A-variate multinomial distribution 


M(n;pi . pt), 

show by use of characteristic functions that the marginal distribution of 
(a^i, . . . , .r^,^), ki < k, is the Ar^-variate multinomial distribution 

M(n;pi,. .. ,pk). 

6.13 {Continuation) Show that the distribution of the conditional random 

variable Xj. | , Xf,_ i is the binomial distribution 

Pi \n a’j a*2 * * * Xf. 2 * - I. 

\ ' “ Pk+Pkvl) 

6.14 {Continuation) Show that a^ 4- • • 4- ki < Ar, has the binomial 
distribution Bi{n \ /?i 4- • • • 4- pk^> 

6.15 Show that the limit of the hypergcometric waiting-time distribution 
given by (6.5.3) as -> oo is the binomial waiting-time distribution given by 
(6.5.15). 

6.16 Verify formula (6.6.12). 

6.17 Verify formula (6,6.18). 
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6«18 If is a random variable having the Poisson distribution Po(^) and if 
the conditional random variable y | x has the binomial distribution Bi{x, p\ show 
that the unconditional distribution of y is the Poisson distribution Poijup)] 


6.19 If, in the binomial waiting-time distribution (6.5.15), y is the new 
random variable a? — show that the limiting distribution of 2 / as A: -*■ oo and 
^ 0 so that kq -*■ is the Poisson distribution Po{y). 


6.20 

by 


If X has the Poisson distribution Poifj) show that the c.d.f. of x is given 


x\ 



e~^z^ dz. 


6.21 Neyman's (1939) type A contagious distribution. Suppose a; is a random 
variable having Poisson distribution Po{y^ and y is a random variable such that 
the conditional random variable y | x has the Poisson distribution PoUi^^). Show 
that the characteristic function of the unconditional random variable y is given 

by 

¥<0 = exp{-/<i[l - e- Ml-'’ )]} 

and that 

^iy) = 




6.22 If a? is a random variable having the binomial distribution Bi(n, p) show 
that the c.d.f. of x is given by 

{n - J ~ yY dy. 

6.23 The bloodtesting problem. Persons in a large population are given blood 
tests in groups of k persons at a time as follows : Blood specimens are taken from 
k persons and pooled. If a test on the pooled blood is negative, a single test is 
sufficient for the k persons. If the test is positive, each person’s blood is tested 
separately. If q is the probability of a negative test on a single person picked at 
random, show that the value of k which minimizes the expected value of the total 
number of tests is the positive integer for which the interval 



1 

k(k 4- 1)(1 - q) ’ 


y k{k - m - ?)) 


contains q. If q is near 1, show that an approximation to k is the solution of the 
equation 

k'^g^ log^ 4-1=0. [Dorfman (1943).] 

6.24 Suppose ajar has n chips numbered 1,2,..., n. A person draws a chip, 
returns it, draws another, returns it, and so on until he gets a chip which has been 
drawn before and then stops. Let a; be a random variable denoting the number 
of drawings required to accomplish this objective. Show that the p.f. of x is given 
by 

p{x) ={x - 1)!^^ " .fl + 1. 
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w + 1 

Verify that ^ p{^) 

ar=^2 


= 1 . 


Show that 


(■-')+( 

,-!)( 


\ «/ \ 

«/\ 

«/ 





6.25 In the binomial waiting-time problem of Section 6.5, suppose x is the 
number of trials required in order to obtain k successive successes. Show that 
the m.g.f. of X is 

wit) = ipe^Y(\ — pe^)(\ — 

and that 


^x) 



6.26 The problem of class-size distributions—the multinomial case. In the 
multinomial distribution having p.f. (6.3.3) suppose 


Pi = 


’ Pk+l 


k + 1 


Let ^ 0 , ^* 1 ,.. . , be the numbers of the components of .. ., which are 
0, respectively. Show that the p.f. of (r^, . . . , r^) is given by 

n\ik -h 1)!(A: + 1)"^ 




i^Qy ''i* • • • > ''n) being subject to the conditions 

'*0 + '“l + • * • + ''n ^ + 1 


Furthermore, show that 

^[r\fo]f>[s,] . . . = 


/•i + 2 r 2 + • • • + = n. 

nl (k + 1)! (k 4- \)-%k -h 1 - B)^-^ 


(0!)MI!)"» • • • (w!)"" in - A)\{k -{-{ - B)l 


where /I = + 2^2 + * • + and 5 = Jq + >51 + ‘ + ■^n- From this 

expression find the value of and cov rj. [Tukey (1949b).] 

6.27 The problem of class-size distributions—the hypergeometric case. 
Suppose a deck of M{k + 1) cards has M cards of each of ^ H- 1 different 
“suits” and after thorough shuffling suppose a “hand” of n cards is dealt. Let 
^0 be the number of 0-cards (blank) suits, in the “hand,” r^ the number of 1-card 
suits in the “hand,” r 2 the number of 2-card suits in the “hand,” and so on. Show 
that the p.f. of the (« 4- l)-dimensional random variable (r^, rj,.. . , rj is given 
by 


(MIY+Hk + 


'•o! '•i-' ■r„l [0! (A/ - 0)! Nl! (M - 1)! • [n\ (M - «)! Y-' 
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Furthermore, show that 


• • • rt.M] = 

(A/!)«(A: + ~ 

(M - 0)!]''o •••[«! (M - «)!]»» 

where /t = + 2^2 + • * * + and B = 5o + Ji + • • • 4- From this 

expression find the value of aHrf), and cov {r^, r,^). [Tukey (1949/>).] 

6.28 The card-matching problem for two decks of cards. Let y4 be a deck of N 
cards, each card belonging to one and only one of the “suits” 5i,..., 5^, the 
numbers,of cards belonging to these suits being Wj,. .., respectively. 
Similarly let J5 be a second deck of N cards having .. ., cards belonging to 
suits Si,, Sk, respectively. Deck A is shuffled and the cards are dealt face up 
along a line. Similarly, deck B is shuffled and the cards are dealt face up along 
a line immediately below the line formed by the cards in deck A, Let rc be a 
random variable denoting the number of pairs of cards in the two decks which 
match in suit. Assigning all possible permutations of cards in each suit equal 
probabilities, show that the p.f. of x is given by the coefficient of 


(k + i - 


in the expansion of 


where 

and 


etxarn^ . . . afkbp • - b^k 
. / k \N 

S^j = 0, i 4 j and 1, / = y 

mi\ m,\ni\ n,\' 


Hence show that the rth moment //' of x is given by the coefficient of 
a^i • ' • a'j^kb^i ’ • ' bj^k in the expansion of 


In particular, show that 

... ^ W-T/, 




[Battin (1942), and Kaplansky and Riordan (1945).] 



CHAPTER 7 


Some Special Continuous Distributions 


In this chapter we present some of the more important continuous 
probability distributions of mathematical statistics together with some of 
their properties. Many of these results will be used in subsequent chapters* 


7.1 THE RECTANGULAR DISTRIBUTION 


The simplest continuous distribution, and one which we shall find it 
useful to define here, is that which has the following p.d.f. 


(7.1.1) 

/(.c) = < 

[L, 

0) 

(!} 

X < 

^ + -- 



0, 

X < fJL — 

(0 

2 ’ 

. 

X> 


It will be convenient to call the probability distribution having this 
p.d.f. the rectangular distribution Rijn, (d). This distribution has the 
following mean and variance 


(7.1.2) 


^(x) = ft, <r*(x) = — . 


The parameter w is called the range of the distribution. 

It will be noted that if :c is a random variable having the rectangular 
distribution w), then 

y = -c 

(0 


is a random variable having the rectangular distribution 1), which has 
the nonzero part of its p.d.f. on the interval [0, 1]. 

An important case of a random variable having the rectangular dis¬ 
tribution 1) is embodied in the following statement: 
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If X is a random variable having a continuous c,d,f, F(x) then the 
random variable y = F{x) has the rectangular distribution /?(i, 1). 


This follows at once from the fact that the c.d.f. of y is 

y> I 

H(y) = P(F(x) < ^) = jy, 0 < y < 1 
io, y < 0 

which is the c.d.f. of the rectangular distribution Ril, 1). 


7.2 THE NORMAL DISTRIBUTION 

The most important distribution function of a continuous random 
variable is the normal or Gaussian distribution. Its p.d.f. may be written as 

(7.2.1) = 

^/27^or 

for — 00 < ir < + 00 , where y and are parameters. It will be shown 
presently that these two parameters are actually the mean and variance, 



Fig. 7.1 Graph of normal p.d.f. (7.2.1) 


respectively, of the normal distribution function. It will be convenient to 
refer to the distribution having p.d.f. (7.2.1) as the normal distribution 
cr^), or simply the distribution cr^) since the normal distribution 
occurs so frequently. The graph of (7.2.1) as shown in Fig. 7.1 is 
symmetrical with respect to a: = // with maximum ordinate IjVlna at 
X = fx. The graph has inflection points at a; = // ± o*. 

The most convenient form of the normal distribution for tabulation is 
that corresponding to a random variable y, where y = (a; — p)Ig, The 
p.d.f. of y is the standardizedform N(0, 1) of the normal distribution. Note 
that 

P(x ' x') = Pit, y’) = P e-'-’ dy 

yJlTT 

where y' — {x' — 



<t>{z) = -^ f* e-*'-’ dy, 

x/ZTr J-QO 
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The c.d.fi 0(a;) of the standardized form of the normal distribution^ 
defined by 

(7.2.1a) 

is widely tabulated, an excellent t^ibiciyeing that prepared by the National 
Bureau of Standards (1942). Greenwood and Hartley (1961) give an 
extensive index of the various tabulations. 

First we show that the integral of the normal distribution function over 
the entire x-axis is unity. 

By putting y = (a; — we can write 

(7.2.2) f“ d* = -^ f* e-‘»’ dy. 

JItTO •/-oo -v/^TT J-oo 


Let / denote the integral on the right without the constant 1 /V Itt, Our 
problem is to show that / = \/27r. Now 

(7.2.3) 

V — 00 V - 00 

Applying the following transformation to polar coordinates 

== f cos d 
— r sin 6 

we have 

(7.2.3<i) /*= ]rre-*'’drde = 2n 

Jo Jo 

and hence I = V^. 

Now consider the characteristic function of (7.2.1). We have 


(7.2.4) 
which can be written as 
(7.2.4a) 










The integral in (7.2.4a) is the integral of the function ^ "***/*'*, where z is a 
complex variable, in the complex plane along a line parallel to the real 
axis, namely, the line y =5 ^la% which can be shown to be equal to the 
integral of the same function along the real axis, that is, the integral 

g~tx*/a* integral has the value aVln, Therefore, 


I. 


7.2.1 The characteristic function of the distribution N(ji, cr*) is 

(7.2.5) (pit) = e""-*"’'’. 
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By the usual diiTerentiation procedure we find the mean and variance of 
the distribution N(ji, o^) to be 

(7.2.6) S(x) = a\x) = a^. 

Treating as a vector parameter in the characteristic function 

(7.2.5) it is evident that the characteristic function satisfies (5.3.7) and hence 

7.2.2 The distribution N(ji, d^) is reproductive with respect to (//, or^). 

In fact, it has a stronger property than merely being reproductive which 
may be stated as follows: 

7.2.3 Suppose x^ and are independent random variables having dis¬ 

tributions N{pi, a‘f) and (y|), respectively. Let L = CiX^ + C 2 X 2 , 
where c^ and Cg are real constants., not both 0. Then the distribution 
of L is N(ciPi -f ^ 2 /^ 2 * + ^2^1)* 

A quick way to verify this statement is to set up the characteristic 
function of L. We have by (5.3.6) 

T(0 = (Pxi^lO * 7 ^ 2(^20 

where (pi(t) and <^ 2 ( 1 ) are the characteristic functions of x^ and X 2 respec¬ 
tively. Thus, 

<p,{c,t) = y = 1,2, 

and we have 

(7.2.7) (fit) = exp + CjsiUjj)? - 

which, by (7.2.5), is seen to be the characteristic function of the normal 
distribution N{c^iXi 4* C 2 P 2 , c\a\ -f- rfol). Applying 5.1.3 completes the 
argument for theorem 7.2.3. This theorem can be extended immediately 
to the case of k independent random variables having normal distributions. 
This extension is left as an exercise for the reader. 

7.3 THE BIVARIATE NORMAL DISTRIBUTION 

The normal distribution occupies a position of such importance in the 
theory of probability and statistics that it will be useful to discuss the 
two-dimensional case in some detail before proceeding to the A-variate 
case. The most convenient form of the p.d.f. of the bivariate normal dis¬ 
tribution is 
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where (x^, is any point in R^, and 

(7.3.2) Qixi, * 2 ) = a“(a:i - fijf 

+ 2ct1*(xi - - /<2) + <^*^2 - )“2)*- 

We shall presently show that and //g are the of and ajg 

respectively, and that the matrix i,j =1,2, where is the 

inverse of the covariance matrix ||(Tj,| 1 of and ajg as defined in (3.5.3). 
Needless to say, we assume that Ho-jJI is positive definite, which implies 
that ||a*^|| is positive definite, that is, 0(a:i, Xg) is a positive definite 
quadratic form. 

We shall find it convenient to refer to a bivariate normal distribution 
having p.d.f. (7.3.1) as A^({,uJ, |l<r(,||), i,j = 1, 2. 

First, let us verify that 

(7.3.3) fixi, X 2 ) dx^ dx2 = 1 

•^7? 2 

which is equivalent to showing that ’ 

(7.3.4) I * I ” dx^ dx^ = -^=. 

- (X3 *1- 00 yj |(7'*^| 

Putting 2/i = and 2/2 = ^2 /^2 can write the left-hand side 

of (7.3.4) as 

(7.3.5) 

f f exp {-i[V< t“(2/i + or’*/(T“ ^2)]* - dyi dy^. 

J — ao w — 00 

Making the transformation 

(7.3.6) (yi + ^ ^ 2 ) 



which has the Jacobian 


we find that the left-hand side of (7.3.4) reduces to 



But we know from (7.2.3) and (7.2.3a) that this double integral has value 
27r. Therefore (7.3.4) is established. 


^{yy y ^} _ ^ 2 ) ^ ^ 

3 ( 21 , 22 ) 
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Td show that and /Mj are the means of and note that by 
differentiating both sides of (7.3.3) with respect to /^x tH obtain 

[or“(Xi - jMx) + — i“2)] = 0 

g[a^{xi - /^i) + a^(x^ - ix^] = 0 

which can be written as 


(7.3.7) a“(f(xi - ^i) + a^^S{x^ - i«2) = 0 

o*^(f(Xx — /2i) + d^S(x^ — fXz) = 0. 


These are homogeneous linear equations in ^(xj — fx{) and ^(xj — pi^) 
and, 'Since l<r‘^| ^ 0, will have the unique solution 

(7.3.8) ^(p^i ~ /^i) ~ 0> ~ 

But formulas (7.3.8) imply that 

(7.3.9) /ix = = ^(x,) 


that is, fXi and (I 2 means of and x^ respectively. 

Now consider the variances and covariances. If we differentiate both 


sides of (7.3.4) with respect to and then multiply both sides by —V \a’'\ln 

we obtain 


2tt 


n oo -22 


But the left-hand side is the expression for 
Denoting this variance by we have 



jUiY the variance of Xy 


Denoting the covariance between and Xg by cTig and the variance of x^ 
by 0*22 we obtain, in a similar manner, 


<^12 — 




Thus, we find the covariance matrix 11 a,, H, expressed in terms of the 
parameters or^, in (7.3.2), to be 

(7.3.10) ||or,,|| = ||cr‘>r^ 

Hence, we have the following result: 

7.3.1 The constants fjLi and 1 x 2 in (7.3,2) are the means of x^ and ajg, w^hile 
the matrix |la*^ll is the inverse of the covariance matrix l|a, J| of x^ 
and x^. 
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If we write the variances of and as a\ and al and the covariance 
as oTiOTg/o, where p is the correlation coefficient between x^ and ajg, then the 
bivariate normal p.d.f. (7.3.1) may be written in the commonly used 
alternative form 


(7.3.11) 

where 


fix^, * 2 ) = 


lirOiO^yJ I — 


(7.3.12) 



It should be noted that the normal p.d.f. is constant on any ellipse of 
the form Qix^, X 2 ) = constant, and that the greatest value of f{x^, x^ 
occurs at the center of gravity of the distribution, namely, (//i, //g)* 

Now let us determine the characteristic function of (x^, Xg), that is, 

r 00 r Qo 

(7.3.13) 

J — CO — CO 

Again letting and 2/2 = ^2 ““ f ^2 

(7.3.14) <p{t„ t,) - - - 

2Wl(T,,l 

where Q'iVi, 2 / 2 ) = + 2 cr^^Vi ?/2 — 2/7i2/i — 2/722/2- 

But Q'(yi, 2 / 2 ) written as 

(7.3.15) Q ( 2 / 1 , 2 / 2 ) “ ^1 “b ^2 “I" (^ 11^1 “b 2(7]^2 ^i^ 2 “b ^ 22 ^ 2 ) 


Too 

V — 00 V — c 


- J0'(i/l,V2 


^dyi dy^ 


where 



The Jacobian of the transformation by which the y'% and z's are related is 

yjj = 1/vV‘i- 

9(^1»- 2 ) 

Utilizing (7.2.3) and (7.2.3^), we find from (7.3.14) that 

7.3.2 The characteristic function of the bivariate normal distribution (7.3.1) 
[or (13A \)] is 

(7.3.16) (f(ti, ti) = exp [KjUiti + ^ "b <^ 22^2 + 

The reader can verify that the bivariate normal distribution is repro¬ 
duced with respect to the vector of means and the covariance matrix. 
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Wc should remark that Zi and Zg are both complex variables, and that 
each of the integrals and obtained when the t/'s in 

(7.3.14) are transformed to z’s, is actually taken along a line in the complex 
plane parallel to the real axis. But each integral has the same value as if 
taken along the real axis. 

It should be particularly noted that the matrix of the quadratic form in 
and ^2 in the characteristic function is the covariance matrix ||(T,^|| while 
the matrix of the quadratic form in and X 2 in the p.d.f. (7.3.1) is the 
inverse of the covariance matrix, namely 
The reader will find it instructive to verify by applying the usual 
differentiation procedure to (7.3.16) that the means of x^ and Xg are jui 
and 1 A 2 and the covariance matrix of x^ and x^ is ||(t,^||. 

If we put ^2 = 0 in (7.3.16), we obtain the characteristic function of x^ 
namely, 

(7.3.17) (^(/i,0) = ^''Vi-J‘^ii'? 
which by 7.2.1 implies that 

7.3.3 The marginal distribution of in the distribution A^({/aJ, llu’^J), 
/,y = 1, 2, is the distribution an). 

A similar statement holds, of course, for the marginal distribution of x^. 
More generally, if £ = c^x^ + where q and are not both zero, 
the characteristic function of L is given by 

(7.3.18) 

99(cif, CgO = exp [/(ci/ii + C 2 // 2 )t — + 2 ai 2 CiC 2 + 0 * 22 ^^^^^] 

and hence we have the following result: 

7.3.4 If x^ and x^ are random variables having the distribution 7V({/iJ, || a,,. ||) 
/, y = 1,2, then L = c^x^ + 02^:2 has the distribution 



Note that 7.2.3 is a special case of 7.3.4 obtained by putting the covariance 
ai 2 = 0 and denoting a^ and a 22 by a\ and a\ respectively. 

Now consider the conditional random variable ^2 | x^, where (xj, x^ has 
a bivariate normal distribution. The marginal p.d.f. of x-^ is 

(7.3.19) A(a:0 = 

yjltrai 

where 
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Using form (7.3.11) of the bivariate normal p.d.f. we have from (2.9.13) 

f{x^ I = -=—^ . 

1 — p* 

where 2(*i, x^) is given by (7.3.12). After some algebraic simplification, 
we find that 

(7.3.20) /(*a|*x) 

~ /T" /, 2 |~ 92 /] ~ ~ P~ (*1 ~ j“l) } 

V27T<T2v 1 — ^ 2or2(l — p )L (7i ^ J / 

from which we can make the following statement: 

7.3.5 In the bivariate normal distribution having p.d.f, (7.3.11) 
the conditional random variable | has the distribution 

N + P ^ (j?! - Pi), <T5(1 - p2)j . 

It can be seen from considerations of symmetry that a similar statement 
is true for the conditional distribution of x^ | Xg. 

It should be noted that the mean value of Xg | x^, that is, the regression 
function of x.^ on Xi in (7.3.20), is a linear function of x^, namely, 

M ^2 1 ^l) =P-1+ P — (*1 - /“l)- 
Hence the equation of the regression line of x.^ on x^ is 

(7.3.21) «2 =/«2 +/»— (^1 -/^i) 

which, it should be observed, is exactly the same as the least squares 
regression line of Xg on x^ given by (3.8.6). A similar statement holds 
regarding the regression function of x^ on x^. This substantiates a remark 
made in Section 3.8(a) which can now be stated more precisely as follows: 

7.3.6 The bivariate normal distribution has the property that its two 
regression functions are linear and are identically the same as the 
regression lines provided by the method of least squares. Further- 
more, the variances of the conditional random variables Xg | x^ and 
Xj I Xg are identically the same as the least squares residual variances 
aS.i and a \.2 respectively, defined in (3.8.8). 

7.4 THE ^-VARIATE NORMAL DISTRIBUTION 

(a) Structure of the p.d.f. of the /c-Variate Normal Distrihution 

The multivariate or k-variate normal distribution and its properties are 
straightforward extensions of the case for k = 2 treated in Section 7.3. 
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The p.d.f. of the j(;>variate normal distribution is 


(7.4.1) 


... ,, .*fc) 

. 


where is any point in and 

(7.4.2) Q{xi, ...,*»)= 2 ‘^”(®< - i“<)(*# - /*#) 

*.i = l 

||(T^^1I being the inverse of the covariance matrix ||cr^y|| and the means of 
the x^. 

We assume that ||<r^^|| is positive definite, which implies that ^ 0, 
and hence lor^^l ^ 0. The distribution having p.d.f. (7.4.1) will be called 
the k-variate normal distribution and will be denoted by 
i,y =s 1,,..,, A:. The sample space of the random variable {x^, . . . , is 
the entire A:-dimensional space 

If we let — /i^, z = 1,..., /c verification that the integral of 

(a?!,..., over Rj^ equals unity is equivalent to showing that 


(7.4.3) f exp (-i 2 0'‘^y<y<) dyi • • • dy^ = 

jRu \ i,i=x ' Vk I 

We can write 

* I ^ I » / 

2 + 2 

<,/»! \ 0 r“ / <.i= 2 \ a“ / 

Letting 


Letting 


we have 


/,; = 2 ,.. ., /c 


2 = *f + 2 <^a)y<y#- 

<, i=l <.<-8 


Continuing this process and setting up the notation 


®(p) — "*■ 


-PP 


. i,; = p+l. fe;p=l.Ic — 1 


= i.; « 1,.. .,fc, 

<,.-1 <-i 


we can write 
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(7.4.4) 2 

^ i = * + l / 


/ = 1 , . . . , /c. 


Since ||(7*^|| is a positive definite matrix, it can be verified that the quantities 
CT^),.. ., - 1 ) are all positive. 

The procedure we have just outlined actually exhibits a linear trans¬ 
formation for reducing the positive definite quadratic form in the exponent 
of (7.4.3) to a sum of squares and is known as Lagrange’s method. There 
is, of course, a family of linear transformations which will yield such a 
reduction. 

The Jacobian of the transformation (7.4.4) is 

19 ( 2 / 1 ,..., 2 /.) 1 _ , / 19 (= 1,...,=*)1 _ 1 


I d{Zi ,. . ., 1 /19(2/1,..., 2/*) 1 

and hence the left side of (7.4.3) reduces to 

1 C'^ / ^ \ 

‘exp (-^|;cfL/ 2 i-- d 2 :;fc. 


(7.4.5) 






^( 1 ) ^(fc-i) 

By making use of (1,2.3a) it is seen that the /:-fold integral in (7.4.5) has 
the value (Itt)^^. Our problem, therefore, reduces to showing that 


(7.4.6) 


• • • 


To establish (7.4.6), we evaluate the determinant |Gr*^j as follows: 


1 (j'2 

^21 


(7.4.7) |cr-| = 


Multiplying the first column of the determinant on the extreme right by 
and subtracting from the second column, the first column by and 
subtracting from the third, and so on, we obtain 


(7.4.8) 


"(1) 


• <) 

< 

< • 

• c 



<r, 
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Continuing the process described above, we find 
which establishes (7.4.6). 

To see that the are actually the means of the random variables having 
p.d.f. (7.4.1), we proceed as in the two-dimensional case, by taking the 
first partial derivatives of 

f . dx^ - - dx^ = 1 

with respect to //^, / = 1,. .., A:. This gives the following equations: 

=0, i = \,...,k 

which can be expressed as 

(7.4.9) i a^^SXx, - //,) = 0. 

i = l 

Since we have assumed ||(r^Jl and hence ||(t*^| 1 to be positive definite, we 
have |(T*^| ^ 0, and hence the only solution of (7.4.9) is 

(7.4.10) - jUi) = 0, or Ax,) = //,, / = 1. k, 

that is, fii,, . . , fXjc in (7.4.1) are (he means u/x^,. . . , x^^., respectively. 

To verify that ||(r'^ || is the inverse of the covariance matrix, we use the 
relation 

(7.4.11) f . dx^ - ■ • dx^ = ^^^. 

Jr* 


Differentiating both sides of (7.4.11) with respect to and then multi- 

'v/ilo’*^! 

plying by —(1 + (5,.^) , where 6^^ is the Kronecker 6 , we obtain 

(Irry 


(7.4.12) ^[(^i -- Pj)] ^ 

a formula which actually holds for / = j as well as i ^ y. Therefore, we 
have the following result: 


7.4.1 The constants , pj, of the quadratic form in (7.4.1) are the 

means of x^,. . . x^, while the matrix |1 or*^ || of the quadratic form 
is the inverse of the covariance matrix || o’,, ||. 

If one denotes the variance of x, by o? and the covariance between x,. 
and X,. by o^o^p^^, then (7.4.1) can, of course, be written in a form, although 
cumbersome, which is the k-variate analogue of (7.3.12). But this is left 
as an exercise for the reader. 
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(b) Characteristic Function of the /:-Variate Normal Distribution 

Now let us consider the characteristic function of the ^:-variate normal 
distribution. We have 

(7.4.13) .... 4 ) = exp j/ 

V * = 1 ) 


exp 

^ Rk 


'll 


(27ry^ 

k 

1=^1 


dx^ • * • dxjc 


where Qix^,. . . ,Xj^)\s given by {7A.2). 

To evaluate the integral (7.4.13) we carry out the /c-variate version of 
“completing the square” which was performed in passing from (7.2.4) to 
(1,2Ad), For this purpose, putting ]/{ = x- — i = 1,. . ., A:, we observe 
that 


(7.4.14) 


k 

“ 

k 

“ 

k 


2/i - 

i 2 

?A - 

i 2 ^gdg 

i,j=l 

- 

?i = l -1 

- 

= l J 


k 

^1 

»,i = l 






k 

1 




gj^g^h’ 


k 

Since = (x^* and, of course, the quantities ^ 

k ^ “1 

2 are Kronecker deltas djf^ and <5-^, respectively, and hence the second 
member of (7.4.14) reduces to 

k k 

Q(xii • • • > 2i'^t^yi ^ 

» = 1 t,; = l 

Therefore 


(7.4.15) Q{x ^,.... a;*) - 2»i = Q\x^, ■ ■ ■, + 1 a,, 

i = l i,7 = l 


where g'(*i. •••.**:) is the quadratic form which constitutes the first 
member of (7.4.14). We may therefore write (7.4.13) as 

(7.4.16) ...,h) = exp - i 1 ■ H 

\ < = 1 i,i = l • 

where 

^ _ V]£j I* g-lO'(*i. •••.**) tja?! ••• dr*. 

(27r)^* Jb* 

Q'{xi,...,x^ is a quadratic form with matrix ||cr'-'|| in the complex 
variables {x^ — + iBi, i = I,... ,k, where the are real, whereas 

Q(xi ,..., r*) is the quadratic form obtained by setting the 5; = 0 in 
Q'(xi .r*). Thus, if we denote a:< - + iBi by it is clear that 
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k 

QX^v • • • 9 ^k) will be reduced to a sum of squares 2 by the transfor- 

1=1 

mation (7.4.4) with replaced by yj. The in this case will, of course, be 
complex variables, and the integration in the case of each is parallel to 
the real axis of that complex variable. But as we have pointed out in the 
cases of the one- and two-dimensional normal distributions, such integrals 
have the same values as the integrals of the same functions along the 
corresponding real axes. Hence N has the same value as that obtained 
by replacing QX^v • • •» ^*) by Qi^i ,..., namely, 1. Therefore 

7.4.2 The characteristic function of the distribution llo’^jll) i^j = 


1 ,..., /: is 

(7.4.17) qity ,. . ., tfc) = exp i 2 

\ t = l 1,3 = 1 / 

If in this characteristic function we put = • • • = 4 = 0 we have 

/ fci fci \ 


(7.4.18) . . . , tfc,, 0 ,. . ., 0 ) = exp I - i 2 OuUti I. 

\ 1 = 1 1,3 = 1 / 

Hence : 

7.4.3 If (a?!,..., is a vector random variable having the k~variate 

normal distribution llor^jll), /,y = 1,..., A:, the marginal 

distribution of ..., (k^ < k), is the k^-variate normal 

distribution Iki^H), i,j = 1,..., Atj. 

(c) Distribution of Linear Functions of Normal Variables 

If in (7.4.17) we put t^^ c^t,i = 1,..., A: where the q are not all zero, 

k 

we obtain, as stated by (5.3.6), the c.f. (p(t) of the linear function L = ^ c^x. 

as follows: 

(7.4.19) wit) = exp [^; ( 2 t - i ( j 

and making use of 7.2.1 we have the following result : 

7.4.4 If(xi, • • .,Xj^) has the k-variate distribution V({/^J, ||(T^y||), i,f ..., A:, 
then L = c^Xy^ + • * * + Cj^jc has the distribution 

V< = 1 i,3 = l / 

Similarly, it can be shown that 

7.4.5 More generally, if == Cj^yX^ -1 -h = 1,..., j, ,y < A:, 

are linearly independent random variables, where (aj^, ..., a:^) has 




Sec. 7.4 


SOME SPECIAL CONTINUOUS DISTRIBUTIONS 


169 


the distribution ||<ri^||), i,j=\ . k, then (Lj.L,) 

have the s-dimensional distribution 

N ({^ ID ’ P,q=U...,s. 

(d) Conditional Distributions from /r-Variable Normal Distribution 

Finally, we consider the p.d.f. of the conditional random variable 
1 ^i» • • • > ^k-i the fc-variable normal distribution. By definition 

(7.4.20) fix, I .. ., x,^,) = ---- ’3) -^ 

where/(a;i, . . . , is the p.d.f. (7.4.1) and /12 ... ...» is the 

marginal p.d.f. of . .., that is, 

(7.4.21) / 12 ... (fc-i)(^i5 • • • > ^k-i) 

Vra r 1 . w.. ./j 


r 1 fc'-i /“ 


where |lcr(]L?,|| is the inverse of the matrix la^,\\,p, q = . ,k — \. Note 

that in the case of the matrix ||ar;j|| and its inverse l|(r‘^||, we have i, j = 
\,...,k. 

Now putting i/i = x^ —fii,i = 1,..., A:, it can be verified that 


- -(7.4.22) i y, V ^ o^,^y,y, 

i,j-l L P=1 -J P,« = l 

where 


(7.4.23) 

k-1 

” S ^lk)^Qk’> P ” 1, . . . , fc — 1. 

/> — 1 

Also we have 

a — 1 

47.4.24) 

__ \^Vq\ _ ^k 

K,l kol 

where i,j = 1,... ,k and p,q = 1,..., k — 1. 

Substituting the expressions for f(xi, ...,x,) and for/i... (*_i)(a;i,..., ar^i) 
into (7.4.20) and making use of {7.4.22) and (7.4.24), we obtain 

(7.4.25) fix,\x„. 



^2tt 1 2 L p=i J i 
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Therefore, we have the following result: 

7.4.6 If . ,Xj) is a vector random variable having the k-variate 

distribution N({fjii], Hcr^jll), /,y = then the distribution of 

the conditional random variable ^ is 

(7.4.26) N+‘lfU^P - 

where the jS* are given by (1.4.23), 

It should be noted that ju(xj^ | ..., from (7.4.25) is linear in 

• • • > ^k-i is identical with the least squares regression plane of Xj^ 
on Xj,..., given by expression (3.8.18). Furthermore, a\xj^ [ ..., 

in (7.4.25) has a value, namely, l/a^*^, which is identical with that of the 
least squares residual variance of Xj^. on ..., a:;^-! as given in (3.8.25). 
Thus, we have established the important fact that 

7.4.7 In the case of a k-variate normal distribution the mean and variance 
of the conditional random variable | a:^, . . . , Xj^__^ are identical^ 
respectively, with the least squares regression function and residual 
variance of Xj^ on x^,..., Xj^_^. 

7.5 THE GAMMA DISTRIBUTION 

In general, the gamma function r(g) is defined for complex numbers g, 
whose real part is positive, by the definite integral 

(7.5.1) r(^) = i^x^-^e'^dx. 

Jo 

Actually, we shall be interested only in real and positive values of g. 

If we integrate by parts it follows that 

r(g) = (^ - mg -1). 

from which it is evident that if is a positive integer 

r(^) = (g- 1 )!. 

Jfg > 0, but not an integer, we have 

r(g) = (g- i)(g - 2) • • • dr(d) 

where 0 < 6 < 1 . In particular, r(i) = Vtt. This can be verified by 
noting that / = VlFd), where / is defined in (7.2.3). 

In problems of statistical theory, the most important values of g, as we 
shall see in later chapters, are (positive) multiples of 1. 

A distribution function which arises frequently in problems dealing with 
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sums of squares and quadratic forms of random variables having normal 
distributions is the gamma distribution which for fi > 0 has p.d.f. 

(7.5.2) m 

forx > 0 and f {x) = 0 otherwise. It will be convenient to refer to a distri¬ 
bution having this p.d.f. as the gamma distribution G(//). It is also called 
a Pearson Type Ilf distribution (see Pearson (1906)). 

The rth moment of (7.5.2) is 


(7.5.3) = ^ 

r(^) Jo 

fi\5m which we find 


r(/^ + r) 

rC/u) 


(7.5.4) 


S(x) = fi, a‘\z) = /J, 


that is, the mean and variance of the gamma distribution are both equal to p. 
This, property, it will be recalled, is also true of the Poisson distribution 
(6.4.1), which, of course, is a discrete distribution. 

By putting = /? -f I in (7.5.2), the function 


Jilt, P) = 


Hp + 1) 


J ' u ' J) +1 
0 


is called the incomplete gamma function. 

Karl Pearson (1922) has tabulated it for combinations of values of 
u and p, with u ranging from 0 to 12 and p ranging from 0 to 50. 

Now consider the characteristic function of the gamma distribution G{p), 

(7.5.5) qit) = f - dx. 

1 (p) Jo 

Putting a:(l — it) = y, we see that 


(7.5.6) cp(t) = (1 - 

If we regard p as a. parameter in the gamma distribution it is evident 
that the characteristic function satisfies (5.3.7) and hence 


7.5.1 The gamma distribution G{p) is reproductive with respect to p. 

As we have already stated, one of the most important applications of the 
gamma distribution arises in dealing with the problem of finding the distri¬ 
bution of certain quadratic forms of normally distributed random variables. 
In the general case of k normally distributed random variables having p.d.f. 
(7.4.1) the quadratic form in which we arc interested is the one in the 
exponent of the p.d.f. itself, namely, , a^*), as defined by (7.4.2). 

Consider the characteristic function of iQ(xi ,..., x^^). It is defined by 





172 


MATHEMATICAL STATISTICS 


If we make the transformation y^ = V\ — It (x^ — i — 1. k, 

(7.5.7) reduces to 

(7.5.8) 9^(0 = (1 - /I)-** f exp (-i i dy^-■ ■ dy^. 

{Itt) jRk \ *,7 = 1 / 

Making use of (7.4.3), we find that 

(7.5.9) (p{t) = (I - 

which is the characteristic function of the gamma distribution C(/:/ 2 ). 
Therefore, 

7.5.2 If (a^i,. . ., a;J is a vector random variable having the k-variate 
distribution A^(K}, ||(r,j|l), Uj = 1 ,..., fc, then hQix^,.. .,Xj;) has 
the gamma distribution G(fc/ 2 ). 

Another type of problem in which the gamma distribution arises relates 
to continuous waiting-time distributions. It will be recalled from Section 6.5 
that (6.5.14) represents the probability of having to wait through x trials 
before obtaining k C’s. Now suppose we think of C’s as events spaced in 
time, and consider how long we must wait in order to obtain k C’s. Let 
(0, t) be the time interval during which exactly k — 1 C’s occurred, and 
let (/, t + A/) be an increment of time during which the A:th C occurs. 
If we cut up the interval (0, t) into intervals each of length Ar, there will be 
//Ar of such intervals. Now suppose the probability of a C occurring in a 
specified time interval A/ is A Ar + o(A0 where A is a constant, namely, the 
average number of C's per unit time. Furthermore, assume that the 
occurrence of any number of C’s in a time interval I is independent of 
the occurrence of any number of C’s in a time interval /' which does not 
overlap I. We want the probability of having to wait through the time 
interval (r, t -I- Ar) in order for the last one of k C’s to occur. This 
probability is the product of the probability of /c — 1 C’s occurring among 
the //Ar intervals into which we have cut (0, t) and the probability of the 
kih C occurring during (/, t + A/). Except for terms of order (A/)^ and 
higher, this probability is obtained from (6.5.15) by putting x = //Ar and 
/? = A Ar. This gives 

(7.5.10) I |(AA0*(1 - 
\k-l) 

_ j (t ~ At)(l — 2 Af) • • • (f — (fc — 2 ) Al) 

(k- 1 )! 
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Dividing by Al and allowing A^ -> 0, we obtain the p.d.f. 


(7.5.11) 


/(0 = 


(fe-1)! ’ 


for / > 0 and f(t) = 0 otherwise. 

This is the p.d.f. of the waiting time required in order to obtain the 
occurrence of exactly k C’s, under the assumptions we have made. 

It is clear from (7.5.11) that Xt is a random variable having the gamma 
distribution G(k). 

Finally, it will be useful to state the following result, leaving verification 
to the reader. 


7.5.3 If X is a random variable having the rectangular distribution J?(J, 1), 
then the random variable y = —log x has the gamma distribution G(l). 


7.6 THE BETA DISTRIBUTION 

Another continuous distribution which occurs frequently in statistical 
theory is the distribution defined by the following p.d.f. 

(7.6.1) f(x) = *”*-1(1 - *)’*-^ 

r(vi)r(*-2) 

for 0 < a: < 1, and f(x) = 0 elsewhere, and where Vi and V 2 are positive and 
real. We shall refer to a distribution having p.d.f. (7.6.1) as the beta 
distribution ^ 2 )* 

To verify that the integral of f (a:) over the interval (0, 1) is unity we pro¬ 
ceed as follows. Making use of (7.5.1) we have 

(7.6.2) r(vi)r(r2) = f” dx^ dx^. 

Jo Jo 

Applying the transformation 

cos® 0 
X 2 = r® sin® 6 

which has Jacobian 

I ^ 2 } I = 4^3 sin 6 cos 0, 

I 9(^®) I 

we obtain 

(7.6.3) r(vj)r(v 2 ) = 4 f°°(c6s e) 2 v 2 -V 2 vx+ 2 v*-l^-r* 

Jo Jo 

But applying the transformation r = Vy to 

2 r"^2v, + 2v,-lg-r* 
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we find that this integral reduces to the definite integral which defines 
+ fj). Therefore (7.6.3) reduces to the following: 

(7.6.4) nyi)^^^) ^ 2 ^'(008 dd. 

r(vi + Va) Jo 

Finally, by making the change of variable cos 0 = Vxin (7.6.4) we obtain 

(7.6.5) r(vi)r(va) ^ rVi-i(j _ a-ys-x 
r(vi -h V2) Jo 


which completes verification of the fact that the integral of/ (x) in (7.6.1) 
over the interval (0, 1) is unity. 

The c.d.f. F{x) of the beta distribution Be(vi, Vg) designated as Vg) 
and called the incomplete beta function has been tabulated under the 
direction of Karl Pearson (1934) in The Tables of the Incomplete Beta 
Function for a; = 0.01 to 1.00 and for Vg = 0.05 to 50. 

The function of and Vg defined by the definite integral on the right of 

(7.6.5) is called the Beta Function of and Vg and is classically written as 
^ 2 )- 

The beta distribution is another important example of a continuous 
distribution for which the characteristic function is awkward in determining 
moments. The simplest way to find the moments of (7.6.1) is by direct 
evaluation. Thus, if is the rth moment of (7.6.1) we have 

(7.6.6) /x; = fV‘+'-i(l - xy^-^ dx. 

r(n)r(v2) •'0 

Using (7.6.5), we obtain the following expression for the rth moment 


(7.6.7) 

from which we find 


/ _ r(ri + »2)r(ri + r) 

r(vi + vj + r)r(j'i) ’ 


(7.6.8) nix) = —, 

Vi + V 2 


a^(x) = 


_ ^ _ 

(h + X'2)*(x’i + Vj + 1) 


There are various situations in which a beta distribution arises in 
mathematical statistics including the theory of order statistics and certain 
statistical tests. We shall discuss these situations in later chapters. One 
of the most important may be stated as follows: 


7.6.1 If Xj^ and Xg are independent random variables having gamma distrl 
butions G(v^ and G(vg) respectively^ then the random variable 


Xj + Xg 

has the beta distribution Be{v^^ Vg). 


(7.6.9) 
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Verification of this statement is straightforward. The p.e. of Xi and is 

If we apply the transformation 
(7.6.11) u = 

V = x^ 

to (7.6.10), we obtain as the p.e. of u and v 


(7.6.12) -—J-u''‘-V>+''«-Hl - u)-(M+i)e-W(i-«) du dv. 

r(vi)r(r2) 

The distribution of u is the marginal distribution of u in the distribution 
having p.e. (7.6.12). Integrating (7.6.12) with respect to v from 0 to oo, 
we obtain the result that u has the beta distribution v^). 

Remarks. Before leaving the subject of beta distributions it will be useful to 
present two important formulas concerning gamma functions due to Legendre 
and Stirling, respectively. These formulas can be obtained by performing some 
elementary analysis on beta distributions. 

Legendre’s Duplication Formula for Gamma Functions 
Suppose we put Vj = vg = v in (7.6.5). Then 

(7.6.13) = f ^*'“Hl — dx =z2 f (a; — dx, 

t (2v) Jo Jo 

Making the change of variable y = A{x — x"^) for 0 < a; < that is a; = 
(1 — Vl — y)l2, we find 


(7.6.14) 


r(2v) 


If -Hi 


— y)~^ dy = 2^-^'' 


rfv + 1 ) • 


Remembering that r(^) = Vw, we obtain Legendre's duplication formula for 


gamma functions'. 
(7.6.15) 


r(2r) = ^ r(v)r(v + j) 

Vtt 


which we shall need in later chapters. / 

y' 

Stirling’s Formula for Large Factorials 

By using certain limit properties of a beta distribution we can establish by 
elementary analysis a highly useful approximation for r(^) for large (real) values 
of^. 
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In (7.6.1) let and V 2 = « + 1> make the change of variable x *= yjn. 
Then 

(7 6 ,6) %+" ,.■" ■ >- fV.(, - - 1 

''•“•‘w ^i\g)n<t + nh \ »/ ” 

for all values of n. Hence the limit of the left-hand side of (7.6.16) as /i-->oo is 
1. It is to be noted that the integrand in (7.6.16) converges uniformly to as 

«00 on any finite interval (0, K). By taking n sufficiently large, we can make 
the difference 

piiinCn.JT) / \n 

J 2/^“M1 - '-J dy y°-^e-y dy 

arbitrarily near zero. But K can be chosen at the outset so as to make 

^00 

I e~^ dy 

Jk 

arbitrarily small. Hence 

(7.6.17) lim I y^''^(l —= f y^-^ e~^ dy = r(^). 

Jo \ Jo 


Therefore 




which can be rewritten as 

(7.6.18) r(^) = lim 
Now let 

(7.6.19) 


w— 00 + /I 4- 1) 

n\ nP 


^_. 00 ^(^ + l)*"(^+/l)‘ 

n n 

Snig) =^l0g« + 2 lOga - 2 '0g(,? + “)• 

a=l a=0 

Differentiating (7.6.19) with respect to^ we find 


(7.6.20) 

Then 


Snig) = log w - 2 


1 


+a 

lim5„(^) = Iogr(^) and lim5;(^)=^. 

n-*oo n-*-oo ^\g' 

Now it will be seen from elementary calculus considerations that 


(7.6.21) 

where 


-Bnig) < - 2 


a=o^ + a 2g 2(g+n) 


-Cnig) 


Bn(g )=J f”'' ^ ■ + -1-) d:c, c„(g )=- . 

J\ \g + ^ - I g +x} ^ Ji g + ^ - i 

Denoting log n — ^(1/^ +•/(,? + «)) by A„(g), and adding AJ,g) to all members 
of the inequality (7.6.21) we get 


( 7 . 6 . 22 ) 


An(g) - B„(g) < S'„(g) < A„(g) - C„(g). 
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Evaluating A„(g), and C„(g) and taking limits of (7.6.21) as «-♦ w, we 

obtain 

from which it is evident that 

Integrating with respect to g, we find 

(7.6.25) log r(^) = g logg - g \ogg + C + 

where C is a constant independent of g. We can evaluate C by making use of 
formula (7.6.15). Replacing v by g and taking logarithms, we have 

(7.6.26) 

log 1X2^) = (2^ - 1) log 2 + log r(^) + log V{g + 1) - I log tt. 

Replacing log 1X2^), log r(^), and log r(^ + i) by the expressions which 
would be given for them by (7.6.25) and allowing g-^ oo, we find C = ^ log 27r. 
Finally, we obtain 

(7.6.27) IX^-) = e-o{\ + 

Inequality (7.6.21) is not sharp enough to give us the coefficients of the 
various powers of \jg in 0(\lg) in (7.6.27). 

Actually 0(\lg) — \l\2g 4* 1/288^^ + 0(\lg^), but to establish this, stronger 
methods of analysis are required than those used here, for example, see Whittaker 
and Watson (1927) or Cramer (1946). For large values of <f,*the asymptotic 
formula 

( 7 . 6 . 28 ) ng) Vl^gy 1 

is sufficient for most purposes. If^ is a positive integer, and since r(^ 4* 1) = gl, 
we have Stirling'"s formula for large factorials 

(7.6.29) V2^g^e-"J, 

7.7 THE DIRICHLET DISTRIBUTION 
The )t-variate analogue of (7.6.1) is the distribution having the p.d.f. 
(7.7.1) f{xi .xj 

= r(»’i + • • • + ^v.-i ... - 

r(ri) • • • r(r;t+i) 

at any point in the simplex: ..., > 0, / = 1,..., 

k \ \ 

2 1 }> in Rjg and zero outside, and where the v, are all real and 

i-i I 
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positive. We shall refer to a distribution having p.d.f. (7.7.1) as the k-variate 
Dirichlet distribution D(yi^ ..., Note that if fc == 1, Diy^\ is 

identical with 5e(vi, that is, (7.7.1) reduces to the p.d.f. of the beta 
distribution The Dirichlet distribution is basic to the proba¬ 

bility theory of order statistics as we shall see in Section 8.7. 

To verify that the integral of f (x^,..., over the simplex 5* is unity we 
apply the transformation 

= 01 

^2 = “ ^i) 


‘ 

to (7.7.1), and obtain 

(7.7.2) /(a?!, dx^ 

= r(Vi + • V H- Vk+l) __ Q y2+ • * * +Vfc+i“lgV2-l 

r(ri)--*l>,,i) ^ 

.(1 - . . 

• Or\l - 0.)^*^^"' • dd, 

where the range of the 0’s is the k-dimensional unit cube {(0i,.. ., 0^): 
0 < 0, < 1, / = 1,..., /c}. Making use of (7.6.5) we have 

(7.7.3) I f(xi, . . ., Xj,) dx^-- dxj, 

^ r(vi -f • - + yfc-n) . r(vi)r(r2 + • • ■ -f Vk+i) ,.. r(Vfc)r(yfc+i) 
r(vi) • • • r(v;j.+i) + * * * + '^k+i) + ^a:+i) 

where it is to be noted that the right-hand side telescopes to unity. 

The integral 

(7.7.4) I — • • • — dx^ • • • 

•/.Sfc 

which therefore has the value 

(7 7 5) r(V|) • - ♦ 

r(v, -f * • • + v^+i) 

was first investigated by Dirichlet (1839) and is known as the Dirichlet 
integral, and by analogy with the terminology introduced for the beta 
distribution (7.6.1) we may properly call the distribution having p.d.f. 
(7.7.1) the Dirichlet distribution • • • » i ^a-i i)* 
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It can be verified that the general moment of the ifc-variate 

Dirichlet distribution has the following value 


_ r(i<i + ri) • • • r(Vfc + rjrCvi + • • • + Vfc+i) 

r(v, + • • ■ + + • • • + r,)r(vi) • • • r(v*) 


from which we find the means, variances, and covariances of the a:’s to be 

Vi 


nix,) = 


”1 + • • • + i 


/ = 1,..., fc 


(7.7.7) 


^(■r\ = 


v.jvi H-+ Vc+I - V,) 


(Vi + • • • + H-+ + 1) ’ 

i = I, . . . , /c 

) =____ 

(n ++ D’ 

/ 7^7 = 1, . . . , /c. 

A variate version of 7.6.1 can be stated as follows: 

7.7.1 Suppose , Xj. .^ are independent random variables having gamma 

distributions (/(I'l), . . . , C7(va+i). Let 


(7.7.8) 2/, =-^, /=!. k. 

Then (//x, . . . , / 4 ) has the k-variate Dirichlet distribution , v*; 

n-n)- 

The argument for 7.7.1 is straightforward and will be left as an exercise 
for the reader. Suppose we put ; i = * * • = = 0 in the general 

moment (7.7.6), where A, - A. The resulting quantity is the general 
moment the marginal distribution of (^i,. . . , of the 

A-variate Dirichlet distribution .... and it has the value 

(11 Q\ ' = >'(^1 + • • • Hn-x + r^^)r(v^ + • - 4- Va:h) 

o./y) ... r(v,^)' 

But this is the general moment of the ^j-variate Dirichlet distribution 
. .., : vjfc +1 + ■ • • + JV,,) which by the multidimensional version 

of 5.5.1a is a uniquely determined distribution. Thus 

7.7.2 If (;ci ,... ,xf) is a rector random variable having the k-variate 
Dirichlet distribution D(j-,. v^\ then the marginal distri¬ 

bution of(x^ .. ., x^^, ki < k, is the k^-variate Dirichlet distribution 
%+i H- 
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We are now able to write down the p.d.f. of the conditional random 
variable . ,,, for a /c-variate Dirichlet distribution by taking 

the ratio of the p.d.f. of D{v ^,. . ., to the p.d.f. of D(vi ,..., 

V)^ + This gives, after some simplification, the following expression 

for the p.e. of | ^ %-i • 


dF(x^ 1 *1, 

. . . , X^.i) 

r(j't + v^t+i)! 

( X, Y-* 

fcviinv*,,) 

1 

1 

1 

. /l_ 

^ \va+i-1 

V 1 - 

3 

1 

1 

1 



It is evident from (7.7.10) that: 


7.7.3 If (x^, . , , y Xf^) is a vector random variable having the k-variate 
Dirichlet distribution D{vy ,. . . , the conditional random 

variable a:*. | . . . , x^_-^ has the property that 


has the beta distribution Be{vj^, i)* 

Referring to (7.6.8) it is to be noted that the mean and variance of 
Xfc 1 Xi,. . ., Xfc..i are given by 

I *1, 

=- ^ -(1 - - 

(7.7.11) I . . ., z,.i) 

^-VWi-(1 - *1 -- 

{V, + + I) 

Another useful property of the A:-variate Dirichlet distribution may be 
stated as follows: 


7.7.4 If (xi,. . ., x^) is a vector random variable having the k-variate 
Dirichlet distribution D{v ^,.. . , v^^+i), the sum x^ -f • • • + X;^. 

has the beta distribution Beiv^ + • • * + Vfc» 

This follows from the fact that the rth moment of 1 — (x^ + • • • + X;^.) is 

+-b r z= \ 'y 
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which, by 5.5.1a, implies that 1 - (a?! + • • • + Xj) has the beta distribution 
Vi + • • • + Vfc). Therefore + • • • + a;* has the beta distribution 

l?e(vi + • • • + Vfc, v^i). 

More generally 7.7.4 can be extended as follows: 

7.7.5 If (xi,... ,x^) is a vector random variable having the k-variate 
Dirichlet distribution D{v^,... ,v^\ Vt+i), then the random variable 
( 2 i ,where = x^ + ■ ■ ■ + x^^, = x^^^^ + • • • + 

..., Zj = x^^^ ... j +1 + • • • + x^^j^ • ..+*, and ki + • • - + k, 

has the s-variate Dirichlet distribution Z>(v,j). v^g^•, V(,+i)) where 

*'(1) = »'l +■•• + %.••• . »'(,) = %+ . . . H-• . • +»,. 

’’(s+1) = ■ ■ ■ +k,+l + • • • + 

For we have by direct evaluation from (7.7.1) 


(7.7.12) • • • a:;‘(l - z^ - a:*,,,-a:,/'”] 

— r . + l ^fci-n) * ‘ + ^k)^(j^k + l >*(!)) 


where 

r(V(i) + + 1 + • • • + Vjfe + Vj+I + r*j+i + • 

• + »•» + '•(!)) 

( 7 . 7 . 13 ) 

^ _ r'(»’(i) + v*.,+i + • 

■ ■ + »’t + l) 


C 

1 

+ 

v-» 

’ r'(^fc+i) 



But the right-hand side of (7.7.12) is the value of 

-- xjnq 


computed from the p.d.f. 

(7.7.14) ... ,2:^) 

= Gzl...-'a:;‘‘;r‘ • • • ’(1 - - ^k,+i - 

which is uniquely determined, since (z^, • • • > ^*) is a bounded 

random variable. 

We can now repeat the process to bring in Sg and so on, finally obtaining 


( 7 . 7 . 15 ) /(^i ,...,0 

— + • ‘ ‘ + Vjs+i)) ;^vn)-l 

• • • r(V(,+i,) ^ 


-V(.) 


■'(1 - 




which establishes 7.7.5. 
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It will be useful, particularly in Section 8.7, in connection with appli¬ 
cations to order statistics, to introduce another distribution closely related 
to the variate Dirichlet distribution. Let .,.,Xj^)bG random variables 
having the variate Dirichlet distribution Z)(vi,..., Let 

yi = xi 

2/2 = + ^2 

(7.7.16) 


2/fc = + • * * + 


Since the Jacobian of this transformation is unity, we have the following 
p.d.f. for 


(7.7.17) /(yi,...,y,) = 


r(vi + • • • + Vfc-n) vi^i 

r(vi) * * • L(v^+i) ‘ ^ 


• ( 2/2 - • • • ( 2 /. - 


where the range of the y’s is the region 0 < < • • • < < 1. It will be 

convenient to call the distribution having p.d.f. (7.7.17) the ordered 
k-variate Dirichlet distribution Z)*(i'i,. .., i). Note that when 

/: = 1, (7.7.17) reduces to the p.d.f. of the beta distribution Be{v^, v^. 
Sometimes we are interested in the marginal distribution of a subset of 
s of the y’s and the following result will be useful in problems of order 
statistics: 


7.7.6 If (yi, , , , ,yjg) is a vector random variable having the ordered 
k-variate Dirichlet distribution D*(v^, .. ., ^*.^. 1 ), then the 

marginal distribution of yk^+ic^^ • • •»2/*^+. • • +k) ^he ordered 
s-variate Dirichlet distribution J^*(i^(i),. . ., V(s); V(,_f.i)) where 
^(i)> • • • 5 ^(s+i) defined in 7.7.5. 

For the random variables y*.^,..., y^^^^... which are defined in (7.7.16), 
have the same distribution as the random variables Zi + 22 » • • • > 
Zi + • • • + Zg defined in 7.7.5. Since (zj,..., z^) has the s:-variate Dirichlet 
distribution i>(v(i),..., ^u); V(,^.i)), it follows by definition that 
(zj, Zi + Zg,..., Zj + • • • + zj has the ordered j-variate Dirichlet distri¬ 
bution Z)*(v(i),..., i/(,^.i,). 

It should be noted that 7.7.6 can also be established by the direct inte¬ 
gration of (7.7.17) with respect to all y’s except y^^, . . . , y^^^.... 

We would integrate successively with respect to yi,..., over the 
region 0 < y * <yky then successively with respect to y* -i-i,..., 
y»,+Vi O''®’’ ^*^® '■®g*on < Vk^+i < • • • < and so on. 
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7.8 DISTRIBUTIONS INVOLVED IN THE 
ANALYSIS OF VARIANCE 

Three distributions closely related to gamma and beta distributions, 
namely, the chi-square distribution, the “Student” distribution, and the 
Snedecor distribution, are fundamental in the analysis of variance and in 
other statistical procedures based on normally distributed random variables, 
which are discussed in later chapters. It will be convenient to state these 
distributions here in their basic forms as further examples of important 
continuous distributions. 


(a) The Chi-Square Distrihution 

If we make the change of variable 

a: = x^l2 

in the p.e. of the gamma distribution (7(//), we obtain 


(7.8.1) 


dF,,(x^) = 


IP 

2m 






This is the p.e. of the chi-square distribution with 2fi degrees of freedom^ 
and when a random variable has such a distribution we shall say it has the 
chi-square distribution C(2/^). In most statistical applications of this dis¬ 
tribution l/Li is a positive integer. Values of fc>r which j = a, 

for various values of a from 0.001 to 0.99, with Ifx = 1, 2,... 30 have been 
tabulated by Fisher (1925a). More extensive tables are given in Biometrika 
Tables for Statisticians, edited by E, S. Pearson and Hartley (1954). 

It will be noted that the characteristic function of the chi-square distri¬ 
bution C(2//) is 

(7.8.2) <p{t) = (1 - 2/0"" 

which is obtained by replacing t by It in (7.5.6). 

It is evident from (7.8.2) that 

7.8.1 The chi-square distribution C(2//) is reproductive with respect to p. 

Since x^l2 has the gamma distribution G{p) it follows from (7.5.3) that 
the rth moment of x^ is 

^ rvip + r) 

IX/m) 

The mean and variance of the chi-square distribution C(2fi) are therefore 

) = 2fi, = 4ju. 
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The chi-square notation was introduced by K. Pearson (1900), and is 
rather awkward, but since it is deeply embedded in statistical literature we 
shall use it. 

Returning to the problem of the distribution of Q(xi, ..., x,), given in 
terms of the gamma distribution by 7.5.2, this can be expressed in chi- 
square notation as follows: 

7.8.2 (xi,... ,x^ is a vector random variable having the k-variate dis- 

k 

tribution ||<r„||),/,y= 1 ,...,*, then - f*)) 

has the chi-square distribution C(k). 

An important corollary of 7.8.2 is: 

7.8.2a If X is a random variable having the distribution 1), then has 
the chi-square distribution C(l). 

The reason for using the term degrees of freedom with reference to Ifji 
in (7.8.1) becomes apparent when we see that the number of degrees of 
freedom mentioned in 7.8.2 refers to the number of random variables 
involved in the (nondegenerate) normal distribution considered. That the 
choice of this term is a reasonable one will become clearer as we discuss 
some applications of the chi-square distribution in later chapters. 


(b) The ‘‘Student’’ Distribution 

The mathematical essentials of this distribution and how it arises from 
other random variables, may be stated as follows: 

7.8.3 Suppose u is a random variable having the distribution iV(0, 1) and 
V is a random variable having the chi-square distribution C(k), If 
u and V are independent^ the random variable 


(7.8.3) 

has the p.d. f 


(7.8.4) A(0 = 



This is the p.d.f. of the ^^Studenf^ distribution with k degrees of freedom. 
For brevity we shall call this distribution the Student"* distribution S{k). 
Various applications of this distribution are discussed in Sections 8.4 and 
10.4. 


To establish the distribution (7.8.4) we apply the transformation 

s = i; 


u 

yjvjk 


(7.8.5) 


t 
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to the p.e. of u and v, namely. 


(7.8.6) 



(f”' 
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and then take the marginal distribution of t, 

The Jacobian of the transformation (7.8.5) is Vsjk and hence the p.e. 
of s and t is 


(7.8.7) 







Taking the marginal p.e. of t (that is, integrating from 0 to oo with respect 
to s)^ we obtain the p.d.f. of t as that given by (7.8.4). 

Values of 4 for which P{\t\ > rJ = a have been tabulated by Fisher 
(1925flf) for various values of a from 0.1 to 0.99 with k = 1, 2,. .., 30. 
A tabulation is also given in Pearson and Hartley’s (1954) Biometrika 
Tables for Statisticians. 

Note that all odd moments of the distribution having p.d.f. (7.8.4) 
which exist are 0. As for the even moments which exist, we have 



and since u and v are independent 


But and v have independent chi-square distributions C(l) and C{k), 
respectively. Therefore, 


m + r)r 


(7.8.8) 


/^2r — P'2r ^ 






which shows that ^ 2 , exists if and only if — 1 < 2r < A:. 

The mean and variance of the “Student” distribution S{k) are defined 
for I > 1 and t > 2 respectively, and have values 


m = 0 , At) = 


k 


k-2 
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The following statement gives the relationship between the “Student” 
distribution and the beta distribution: 

7.8.4 If t is a random variable having the "^Student*" distribution S{k), 
then X = —^5 has the beta distribution Be{\k, i), 

1 +7 

k 


(c) The Snedecor Distribution 

Thc/mathematical essentials of this distribution and how it arises may be 
stated as follows : 


7.8.5 Suppose u and v are independent random variables having chi-square 
distributions C{k^ and C{k^, respectively. The random variable 




(7.8.9) 


has the p,e. 


_ u I V 




This is the p.e. of the Snedecor distribution with k^, k^ degrees of 
freedom. For the sake of brevity we shall refer to this distribution as the 
Snedecor distribution k^. Applications of this distribution are dis¬ 
cussed in Sections 10.4 and 10.6. 

The Snedecor distribution can be established by applying the trans¬ 
formation 


(7.8.10) 


kj/ ^2 


to the p.e. of u and v 
(7.8.11) 


4r(iki)r(ifc* 


and taking the marginal distribution of This gives as the p.e. of 
and ^ 

fef‘ 

2r(ik,)r(ifc*) \2/ 


Integrating with respect to “S from 0 to oo, we obtain (7.8.9) as the p.e. 

ot^. 
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The rth moment of the Snedecor distribution is 


(7.8.13) r i i^i + r)rQk 2 - r) 

\kj \kj mk^k,) 

which exists only for —ky<lr< k^. For the mean and variance of 
we have 


(7.8.14) 


^ 2kl(k, + k,-2) 
/c,(fc, - 2)\k, - 4) ■ 


There is also a connection between the Snedecor and beta distributions 
which may be stated as follows: 

7.8.6 7/*.^ is a random variable having the Snedecor distribution S{ki, 

then the random variable [1 + ki^lk 2 \~^ has the beta distribution 
k 

Bedk^, Iki), and -■— ^ has the beta distribution Be{\ki, \k^, 

k^ “F ky^ 


The Snedecor distribution is, from a practical point of view, a slightly 
more convenient form of one originally suggested by Fisher (1924) for 
use in the analysis of variance. The random variable 2 originally proposed 
by Fisher is related to Snedecor’s.^ as follows: 

(7.8.15) 2 = Jlog.i'^. 

Fisher (1925a) has tabulated values of 2 , for which P{z > 2 J = a for 
a = 0.01, 0.05 and for Atj = 1, 2, 3, 4, 5, 6, 8, 12, 24, 00 , and for == 
1, 2,. .., 30, 60, 00 . Snedecor (1937) has tabulated the values of 
corresponding to 2 ^ (where k log for a = 0.01, 0.05 and for 
*1 = 1, 2, .. ., 12, 14, 16, 20, 24, 30, 40, 50, 75, 100, 200, 500, 00 , and 
*2 = 1, 2,. .., 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 55, 60, 65, 70, 80, 
100, 125, 150, 200, 400, 1000, 00 . For subsequent tabulations see 
Greenwood and Hartley (1961). 

If ^ has the Snedecor distribution S(ki, k^, it follows from 7.8.6 that 
can also be determined from Karl Pearson’s (1934) Tables of the 
Incomplete Beta Function, since \k^ and ^a.— 

*i(l - *a) ' 

PROBLEMS 


7.1 Prove 7.2.2. 

7.2 If X is a random variable having the normal distribution N(jt, a\ show 

that _ 

S{\x — /w|) = ‘N/(2/7r)(T. 

7.3 If a? is a continuous random variable with a unique median show that 
^(1^ — cl) is minimized for c = ^ 0 . 6 - 
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7.4 Suppose a?! is a random variable having the normal distribution a®), 
and is a conditional random variable having the normal distribution 

^(^if Show that (a?!, has the two-dimensional normal distribution 
Ikivll) where 


Ikdl = 


a* a* 
cr2 2a2 


7.5 Let f{xi, x^ be the p.d.f. of the circular normal bivariate distribution 
^({ 0 }, |l<5,j|l), /, y = 1, 2. Show that the integral of over the square 

—k)y + k), (k, — k\ (ky k) is less than the integral of f(x^y x^) over 
the circle x\ x\ < 4k^l7r and hence that 

dt < iVl 

V2.J0 


a result due to Williams (1946). 


7.6 Suppose /(x^y X 2 ) is the p.d.f. of the two-dimensional normal distribution 
N({Q), ||<r<,||) where 



Let Fi(x) and FgCa;) be the c.d.f.’s of the marginal distributions of x^ and Xg, 
respectively. Show that the correlation coefficient of the random variables 
^ 2 ( 3 ^ 2 ) ( 6 / 77 ’) sin ^ i.^p)’ 


7.7 If (x^y X 2 ) is a two-dimensional random variable having the distribution 
N({0}, ||<T,-,||), where 



show that the ratio 


has p.d.f. 


z 




Vl 

7r[l — 2pz -f 2 *] 


[Fieller (1932).] 


7.8 If Xi and X 2 are independent random variables each having distribution 
/?(i, 1 ), show that V -2 log x^ cos 277 X 3 and V -2 log x^ sin 277 X 3 are inde¬ 
pendent random variables each having the distribution A^(0,1). [Box and Muller 
(1958).] 


7.9 If x^ and X 2 are independent random variables having gamma distributions 
G{ki) and G(k^ respectively, show that x^ + Xg and xj(x^ Xg) arc independent 
random variables (having the gamma distribution G(ki 4 * k 2 ) and beta distri¬ 
bution Be{kiy k^y respectively). 
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7.10 If and X 2 are independent random variables with gamma distributions 
G(v) and G(v 4* ]) show that the random variable y = iVx^x^ has the gamma 
distribution G(2v). 

7.11 A two-dimensional random variable has the p.d.f. 


1 


nkimk^) 


X^i \X2 - 


in the x^x^-planc where 0 < ^ : < a-, and 0 elsewhere. Show that the 

marginal distributions ofx^ and x^ are gamma distributions G(ki) and G{ki -f ^2)* 

7.12 Suppose X is a continuous random variable such that for some positive 
integer k, kx has the gamma distribution G(k). Suppose y \ x is a discrete 
conditional random variable which has the Poisson distribution Po(x). Show that 
the unconditional distribution of/y is a binomial waiting-time distribution. 

7.13 If and Xo are independent random variables having beta distributions 

^^*"1 + 2,1'l) show that v .jy‘2 has the beta distribution 

Be( 2 v^, 2^2). 

7.14 If . . . , x^, are independent random variables each having the 
rectangular distribution R(i, 1) show that — log • • • .r^.) has the gamma 
distribution G(k). 

7.15 If u is a random variable having the chi-square distribution C(k) show, 
by the use of characteristic functions, that the limiting distribution of (// — k)l V 2k 
as k CO is the distribution A(0, 1). 

7.16 Suppose X is a random variable with c.d.f. F(x) and characteristic 
function If [9(0]^^" is a characteristic function of some random variable for 
every positive integer n then x is said to be infinitely clivisihle, that is, for each /?, 
F{x) is the distribution of a sum of// independent random variables, each having 

as its characteristic function. Show that a random variable having a 
Poisson, gamma, or normal distribution is infinitely divisible. [See Gnedenko 
and Kolmogorov (1954) for a treatment of infinitely divisible random variables.] 

7.17 Show that the p.d.f. of the Student /-distribution given by (7.8.4) 
converges to the p.d.f. of the distribution A(0, 1) for every t as k ^ cc\ 

7.18 If x^, . . . ,x,. are independent random variables all having the distri¬ 
bution A(//, ( 7 ^) and if Cj.are (real) constants whose sum is 1 and whose 

sum of squares is 1, show that + • • • + t\.Xt, also has the distribution 

N{y, a^). Show that no set of values of the cs exist, which are all positive, 
satisfying these conditions. 

7.19 If iTj, ..., xj, are independent random variables, each having the 

distribution A(0, 1), and if y^ .//,. .v < k, are new random variables defined 

by Vp = + • •■ + fpA-'A., P = where 2 f »,<■<,. = 0 for /> / q, and 

1 for;? = q, show that Vj,.... 2/, are independent, each having the distribution 
iV(0,1). 
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7.20 Show by appropriate differentiations of the characteristic function 
(7.4.17) of the/r-dimensional random variable (»!,..., having the distribution 

11^0 ID that and - n,) = (T.,. 

7.21 Show (by the use of characteristic functions) that if .. ., and 

, a;J.) are independent /r-dimensional random variables having distri¬ 
butions ||fT,j||) and then the ^-dimensional random 

variable (xy + arj,. . . , -f has the normal distribution + //J}, 

|l(T,j -h That is, the /^-dimensional normal distribution is reproductive with 

respect to its vector of means and its covariance matrix. 

7.22 Suppose // is the “true'’ length of a standard bar, and let be a random 

variable denoting the length of a first-generation copy of the bar. Let be a 
random variable denoting the length of a copy of x^, that is, a second-generation 
copy of the bar and so on, until is a random variable denoting the length of 
a kXh generation copy of the bar. If (^i — /O, (^2 — )»•••» “* 

are independent, each having the normal distribution JV(0, 1), show that 
the distribution of (a:^,. . ., ar;..) is a ^-dimensional normal distribution 
||<Tjj||) where //. = // and 


1 

1 

1 • 

1 

1 

1 

2 

2 • 

2 

2 

1 

2 

3 • 

3 

3 

i 

2 

3 • 

• • A: - 1 

k - 1 

1 

2 

3 • 

• • k - \ 

k 


7.23 If (a*!,.. ., a;;j.) is a vector random variable having the /^-dimensional 
spherical normal distribution where = 1 for i = /, and 0 for 

i /, and if 

r = [{X, - + « . . + {X, /(,)2]l/2, 

show that 


^(r) = Via 


r(lk -h i) 
r(ik) 


7.24 Assume that a certain class of objects has a “mortality law” such that the 
probability of an object drawn at random from the class “expiring” during the 
time interval (r, t + dt) (in suitable time units) is 


nk) 


dt. 


As soon as the object “expires” it is replaced by another of the same type. As 
soon as the second one “expires” it is replaced by a third of the same type, and 
so on. Show that the probability that the rth object in such a sequence “expires” 
during (r, t -h dt) is 

T(rk) 


7.25 (Continuation) Renewal process with fixed general '‘'’mortality law."''' If 
/(r) dt is the probability that an object in an initial (zeroth) generation has to 
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be replaced by a successor during (t, t + dt), and if g„(T) dr is the probability 
that an object in the nth generation has to be replaced by a successor duriag 
( t , t + dr), and assuming the objects in any generation have the same “mortality 
law” as the initial generation, show that 

gn+i(0 ~ 


and hence that if a “steady state” replacement law g{t) exists it must satisfy the 
integral equation ^(t) = J g(T)f(t - r) dr. [Lotka (1939) and Smith (1958)]. 


7.26 Suppose .. ., are independent random variables all having the 
distribution N(0, 1). Let 


2/i = 


xf + 


i = 




Show that ( 2 / 1 ,. .., yjc) has the it-dimensional Dirichlet distribution 

7.27 Prove 7.7.1. 

7.28 Let yic^ = where r is a random variable having the Snedecor 
distribution S(ki, k^. Show that the sequence of random variables ( 2 / 1 , 2 / 2 ,. ..) 
converges in distribution to a random variable having the chi-square distribution 
with ki degrees of freedom. 

7.29 Suppose (.Tj, . .., x^) is a. /:-dimensional random variable whose p.d.f. 

is of form gioix^ + * • • + OfcX,^) over the region where x^ > 0,..., > 0, and 

0 elsewhere, and where Oj,..., are positive constants. Show that the p.d.f. of 
the random variable 2 / = + • • H- is 

--ttTTa 2 /* “',?(.'/). ?/ > 0, and 0 for 1 / < 0. 

(fli • • • flj,)! (^) 

7.30 If (a;i,..., a-;^) is a k-dimensional random variable having p.d.f. of form 

g(y) where ?/ = ^ the matrix of constants, ||a„l| being symmetric and 

ij 

positive definite, show that the random variable y has p.d.f. 

^/2 

a/T —y > 0 for 2 / < 0. 

^\aij\ r(A:/2) 

7.31 If (xj, . .., a;;t) is a /^-dimensional random variable having the distri¬ 
bution N({fii}; ||cr,,||) show that the conditional p.d.f. /(^i,..., x, | . yXj^) 

is the p.d.f. of the j-dimensional distribution where the 

value of /i* is given by the equation 


0*5 

-ftp) 

.9+ 1 

^pk 

(^a+l 

- /'»+l) 

s+1 


(^k 

-Mk) 

s+1 

• 
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and where 




^VQ 

^PS-hl 

^pk 




k 

. 

^kQ 

^ks-hl 

^kk 


^S+1 S+l ’ ’ ’ ^S-fl k 


I ' ^kk \ 

p,q ^ , ,s. 

7.32 Poisson process. Consider the following approach to (7.5.11). If we 
denote by E the event of k C’s occurring during (0, t 4- A/), then E is the union 
of A: + 1 mutually exclusive events Eq, Ej, ..., E^, where E* is the event of / C’s 
occurring during (/, t + A/) and k — i C’s occurring during (0, /). Therefore 

E(E) = E(Eo) 4- E(Ei) 4- • • • + E(E,). 

If we assume that the occurrence of any number of C’s in a time interval / is 
independent of the occurrence of any number of C’s in a time interval I' where 
/ n /' = <|), then the probabilities of the occurrence of 0, 1,..., A: C’s during 
(r, t + At) are 

1 - AA/ + o(At), XAt + o{At\ . .., aAtf 4- o{{AtfX 

respectively. Then if we denote P{E) by fpfj 4* At) we have 

E(Eo) = A(/)(l - AAr) + o{At) 
P{E^)^fk-mXAt)+o{At\ 

while ECEg),..., E(Efc) are of order of magnitude (Ar)^ or smaller. We therefore 
obtain 

/,(/ 4- Af) =/,(/)(! - XAt) + A_i(0AAr + o{At). 

From this we obtain the differential equations 

/i(0 = A/,._i(r) - XfM 

for A: *= 1, 2,... . (Note that for k = l,/li(0 = 0.) Show that the solutions of 
this set of differential equations gives for/fc(0 the expression (7.5.11). Note that 
t > 0} is an example of a stochastic process with a continuous parameter t. 
This particular stochastic process is called the Poisson process. 

7.33 Yule^s (1924) birth process. Suppose we have a population of objects 
which can generate (or give “birth” to) new objects, and that objects do not 
disappear (or “die”) from the population. Let f^it) be the probability of k objects 
in the population at time t and consider the probability of the event E of there 
still being k at time t + At. Then Eiscomposed of disjoint events E„E„...,E, 
where E, is the event that there are A: — / objects in the population at time t and i 
were generated during (/, / 4- At). By making assumptions of independence of 
occurrence of births in two nonoverlapping time intervals similar to the 
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’'onoveriapping time intervals 

in Problem 7.32, show that/*(/) satisfies the system of differeUal equations 
f'kiO = (fc - IWk-iit) - k}.f,U), k= 1,2 . 


If A: —matt — 0 and /m(0) — 1, f^(0) = 0, A: = wi, m + 1, ..., show that the 
solution of the system of differential equations is given by 


MD = 





7.34 Simple queueing. Suppose the average number of arrivals of customers 
per unit time at a service counter is X, whereas the average number of departures 
of customers (after receiving service) per unit time is //. Let/^Cr) be the pro¬ 
bability of there being a queue* of k customers*at the counter at time t. Then 
+ Ai) is the probability that there will still be a line of k customers at time 
t + Ar. If E is the event of there being a line of k customers at time r + Ai and 
if is the event of there being A: — / + y customers in the line at time t 

and also i arrivals and j departures (after being served) during (i, t + Ar), then 
the collection of events are disjoint and their union is E where (/,y) 

range over pairs of non-negative integers such that A: — / -I- y > 0. Hence we 
have 

f,u + Ar) = P(£) = 2 P(£, 

ij 

Suppose we now make assumptions of independence concerning arrivals and 
also departures in disjoint time intervals similar to those concerning the 
occurrence of C’s in disjoint time intervals involved in Problem 7.32.. Under 
these assumptions the only events which have nonnegligible probabilities are 




Their probabilities are: /fc(/)[l — (A +//) A/] + 0(Ar), + 0(Ar), and 

+ 0(^0i the probabilities of all other events being of order (Ar)^ or 
higher. Show that /^-(/) satisfies the following system of differential equations: 

fk(0 = Wk-i(0 + i^fk+iiO “+ lOfMl k = U2... 

[For a detailed treatment of waiting-line (queueing) theory the reader is referred 
to Feller (1957), Kendall (1951, 1953) and Morse (1958). Morse’s book has a 
substantial bibliography.] 

7.35 (Continuation) For the steady state case in which f^O) = 0 show that 
the solution of the resulting difference equation, namely, 

^fk-l "b t*fk+l ."h 

where /_i == 0, is given by = p%y where p = A///. 

If the facilities will only accommodate a queue of length n, then show that 

f 0 -p) 

Jo (1 -pn+1) 

♦ By queue here we mean those customers who are either waiting to be served or are 
being served. Some authors use the term to mean only those waiting to be served. 
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Also show that ^(k) the mean length of the queue (in the steady state case) is 
given by 

W (1 ^^)(1 • 

Also show that the mean length of the queue for the case p — 1 (average rates of 
arrivals and of departures being equal) is and the mean length of queue for 
p < 1 for indefinitely large queueing accommodations (n = oo) is p/(l — p). 
Furthermore, show that the mean number <^(/) of number of customers / waiting 
to be served is given by 


n 


pi - „pn+l +(„ - l)pn+i 



CHAPTER 8 


Sampling Theory 


8.1 DEFINITION OF A RANDOM SAMPLE 

Suppose a; is a one-dimensional random variable with c.d.f. F{x) in R^, 
A random sample of size n from a population with c.d. f. F(x) is definefas 
the w-dimensional random variable (ajj,..., ar„) with c.d.f. 

( 8 , 1 . 1 ) 

< = i 

in the sample space /?„ = x • • • x R^^^^ where R[^^ is the one-dimen¬ 
sional sample space (the real line) of Xt, f = 1, . . . , It should be 
especially noted that the elements or components , ,,, of the sample 
are mutually independent and all have the same c.d.f. In mathematical 
statistics the notion of a random sample as defined above was originally 
introduced by FisJatr (1915), although he did not actually use the term 
“sample sp.^e” in referring to R„. The term “sample space” was used in 
this sense however until the 1940’s, and since then it has been used in 
the wider sense as stated in Chapter 1. The word “random” has been 
used mainly for descriptive effect, and will usually be omitted. We shall 
frequently abbreviate further and denote the sample {x ^,.. •»^n) l^y On» 
For the sake of brevity we usually say that (j^i,. . . , x„) (or is a saniple 
of size n from F{x). Note that a sample of size n from F{x) is a simple 
example of a finite stochastic, process, and is sometimes called simple 
random sampling. 

If a; is a random variable of the discrete type, with p.f. p{x), then the 
sample has a p.f. 

n 

195 


( 8 . 1 . 2 ) 
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in and if a: is a random variable of the continuous type the sample has 
a p.e. 

(8.1.3) 

in Rn- For the sake of brevity we often refer to (x -^,..., a;J as a sample 
from p{x) in the discrete case, and as a sample from f{x) in the continuous 
case. 

Remark. It is convenient to think of as the random variable dei^ng the 
value of X in the population obtained in the first “drawing,” that obtained in 
the second “drawing,” and so on. For example, if we throw a true die n times 
successively, we can regard x^ as a random variable denoting the number of dots 
appearing at the |th throw, ^ = 1,. . . , w. Our sample in this case consists of 
the independent random variables x^^... ,Xn each having the p.f. defined by 

p(x) = -J, a; = 1, . . . , 6. 

The p.f. of the sample is 

p(x^)- • 'p(x^) = 

defined at each of 6^ mass points in Rf^ whose coordinates correspond to the 
6^ possible sequences of faces which could be obtained in throwing a die n times. 

In the theory of sampling we are usually interested in the distribution 
function of one or more functions of the n random variables comprising 

n 

the sample; for example, the sample sum z = sample mean 

5-1 

^ I « 1 

a: = - 2 Xt, the sample variance s^ =-- 2 (-^5 smallest 

n — 1|-1 

sample element min(a-i,..., .rj, the largest sample element max 
(x ^,. .. , etc. In general, if , xj is such a function which is 

itself a random variable, we are interested in determining its distribution 
function. Such a function g{x ^,. .. , a;„) is called a statistic, whose c.d.f., 
say //(?/), is a special case of (2.8.12), that is, 

(8.1.4) //(.V) = P{g{x.^,. < y)=( dF{x^ ■ ■ ■ dF(x„) 

where g H//) is the set in R^ for which g(xi ,. . . , a;J < y. The function 
H(y) is the c.d.f. of ;[f(a:j, . . . , .r^,). 

Similarly, if g-(x .. .rj, / = I, . . . , 5 < « are s (functionally in¬ 

dependent) statistics, wc arc interested in determining the c.d.f. of these 
statistics. The c.d.f. of the g^ix .is defined by 

(8.1.5) //(//,.//) = P(g,(x .. .rj / = 1, . .., s) 

dF{x^ • • • dF(a:J, 

Wn) 
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where , y„) is the set in R„ for which 

• • *» ^ 3/jj / — 

The random variable z may be a A:-dimensional random variable 
(xi,... ,x^ with c.d.f. /’(*!, in /?*, in which case the sample 0„ 

is a «fc-dimensional random variable .a:^^; f = 1. n) with 

c.d.f. 

( 8 - 1 - 6 ) 

in /?„*, = R^^'> X • • • X R^^\ where R^^ is the sample space of ..., X/^f), 
i = \,... ,n. Again, the sampling distribution problem is to determine 
the distribution function of one or more functions of the nk random 
variables x^, i = I,..., k; S = I, ..., n. For example, we may be 

n 

interested in dealing with such statistics as sample sums 2 ^ == 2 

f=i 

\ n 

sample means - 2 elements 

n 5=1 

1 w 

% = - *;•) 

of the sample covariance matrix, etc. 

Sometimes we have to deal with sampling theory of functions of two 
or more random samples. For instance, suppose ..., and 

^na • (^ 2 i» • • •» ^ 2713 ) samples of size ni and «2 respectively, from popula¬ 
tions having c.d.f.’s Fi(x) and respectively. This means that we 
consider ..., x^^^, X 21 ,..., X 2 n^ as an (wi + 772 )-dimensional random 
variable having c.d.f. 

Wi W2 

(8.1.7) TI^i(*Ui)‘ TI^ 2 (* 2 ? 8 ) 

ii=i i»=i 

in R„ . The particular sampling problems with which we shall be 
concerned will be the determination of the distribution of one or more 
functions of the components of this («i + n 2 )-dimensional random 
variable, or at least the determination of certain properties of such a 
distribution. Similar remarks hold for three or more samples. 

In the sampling distribution problems of mathematical statistics, we are 
usually interested in relatively simple statistics, such as averages, sums of 
squares, ratios, covariances, etc. Simple and explicit expressions exist for 
the p.f. or the p.d.f. of such sampling distributions only if population 
distributions have certain special forms, as we shall see in subsequent 
sections. However, one can determine means, variances, covariances, and 
some of the lower moments of these statistics for rather general population 
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distributions by applying the results of Sections 3.3, 3.4, and 3.5. We 
shall consider some of the more important results of this type in the next 
section. 

8.2 MEANS AND VARUNCES OF MEAN, VARIANCE, AND 
OTHER SYMMETRIC FUNCTIONS OF A SAMPLE 

(a) Mean and Variance of Sample Mean 

Suppose ..., a; J is a sample from a population whose distribution 
has mean ^ and variance Consider the problem of determining the 
mean value and variance of the sample mean x. 

We have, by definition, 

(8.2.1) x = -(xi + -h x„). 

n 

Taking the mean value of both sides of (8.2.1), we have 

(8.2.2) ^( j) = - (A*i) + • • • + #(x„)]. 

n 

But since arj,. .., are random variables having identical c.d.f.’s, we 
have, 

^(Xi) = . . . = ^(X„) = fl. 

Substituting these values in (8.2.2) we have 

(8.2.3) ^(x) = /A, 

Without attempting a discussion of the basic concepts and principles of 
statistical estimation at this stage, we remark that (8.2.3) asserts that x is 
an unbiased estimator for //. A treatment of estimators will be presented 
in Chapters 10, 11, and 12. 

Now consider the variance of x. We have 



a\x) = S[x — S{xy^, 

But 

X - S{x) = i 2 (*f - /*)• 

n 1 

Hence c^{x) 



== -12 ^ 2 - (“)(*, - /“)]• 
n 1 n 

Since ..., 
have 

x^ are independent and all have the same distribution, we 

(8.2.4) 

= 1 = 1,.... n 

^[(*1 - /<x», -/«)] = 0, f »j. 
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n 

(b) Mean and Variance of Sample Variance 

Now consider the sample variance which is defined as 


( 8 . 2 . 6 ) 

We can write (8.2.6) as 




n — 1 1=1 


(8.2.6fl) s’ 


2 __ 


I 


n — 1 ^ 


(*f -/*)-- 2 (*, - iw)l 

L n n J 


= - 2 - 7 ^ . 2 (*{ - )“)(*, - 

n ^ n(n — 

Taking mean values and using (8.2.4), we have 
(8.2.7) tf(s^) = a\ 


We remark at this point that the reason for using ai — 1 as the divisor 
in (8.2.6) rather than n is to make (S’{s^) exactly equal to a*, that is, to make 
s^ an unbiased estimator for a^. 

Carrying out similar mean value operations we find after some reduc¬ 
tion that 

(8.2.8) ^[(5^)^“] = ^ + -1)^ + 2 ^4 

n n(n — 1) 

where ju^ is the fourth central moment of the population distribution. 
Substituting from (8.2.7) and (8.2.8) in 

a\s^) = ^[{s^f] - [S\s^)f 

we find for the variance of s^ 

(8.2.9) = 

n\ n — l / 

We can summarize as follows: 

8.2.1 If . . ., ir„) is a sample from a distribution with mean p, and 
variance a^, the sample mean x has mean and variance 

(jZ 

S{x) = (l, <T^(x) = — . 

n 

Furthermore, if the fourth moment of the population distribution is 
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finite, then the sample variance ^ has mean and variance 

^(s*) = tr*, = . 

n\ n — 1 / 

The means and variances of higher sample moments can be carried 
out in a similar manner, although the tediousness of the process increases 
rapidly. The reader interested in further results should consult Kendall 
(1943). 

(c) Fisher’s A:-Statistics 

For some purposes the semi-invariants of a distribution are more 
convenient to deal with than the moments of the distribution. In this 
section we shall consider certain unbiased estimators of the semi-invariants 
of a distribution from a sample from the distribution. 

Fisher (1928^) has devised functions ..., /‘=1,2,..., 

where , a; J is the most general homogeneous polynomial of 

degree r in ^ subject to the conditions that (i) . .., x J is 

symmetric in ajj,..., a;„, (ii) where is the rth semi-invariant 

(cumulant) of the distribution from which the sample is drawn as defined 
in (5.1.11). Such functions are called k-statistics. It is sufficient to 
indicate briefly how the coefficients are determined by considering the 
first three ^-statistics. The reader interested in further details about the 
construction and sampling theory of ^-statistics and related problems 
should consult Craig (1928), Fisher (1928a), Cornish and Fisher (1937), 
Dwyer (1933), and Kendall (1943). 

The first three /:-statistics are seen to be of the form 

ki = oi2*{ 

(8.2.10) ^2 = + <h2 I, 

*8 = flsi 2 *4 + flsa 2 + «33 2 

By taking mean values of these three quantities and equating them to 
the first three semi-invariants of the population distribution, namely 

•<2 = /4- C«i)® = 

><2-1*2- + 2C»i)® 


( 8 . 2 . 11 ) 
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where fi'i, and are the first three moments of the population dis¬ 
tribution, we determine the constants. Inserting the values of the constants 
thus found in (8.2.10) we have as the first three ife-statistics 

j, _ 1 

n 

(8.2.12) k, = — (nz, - z\) = s* 

n(n - 1) 

fcg =- - - (n\ — ^nz^Zo + 22 ?), 

n{n - i)(n - 2) ^ i 2 t i/, 

where z^ = 2 4’ ^ = U 2, 3. 

Note that the first two ^-statistics are simply x and whose mean values 
are and cr^, these being the first two semi-invariants and k^. 

(d) Mean and Variance of Certain Symmetric Functions of a Sample 

It will be noted that the sample mean x, the sample variance the 
sample moments, and the sample A:-statistics are all symmetric functions of 
the elements of a sample (x^, ..., xj. Now let us consider the problem 
of determining the mean and variance of a more general class of 
symmetric functions of a sample. 

Suppose (x ^,..., is a sample of size r from a c.d.f. F(x), and let 
. 5 ^r) ^ function of (xj ,,,. ,x^) such that 

(8.2.13) ^(gXx„ ...,*,)) = {^1 ! ^ 2 

where 0i and 63 ^ire both finite. For a sample (arj,..., xJ of size «, 
n > r let 

(8.2.14) goi^rii* • • • » ^ • • • » 

where . . ., is a selection of r of the integers !,...,« with rji < 
•••< 7 ]^ and where denotes summation over all r! permutations 
(fi,...,' f,) of ( 7 ^ 1 ,..., 77 ,). We note that go{x ^^,..., is a symmetric 
function of . .., x^^. Let 

(8.2.15) ^^(*1, ...,»;„)= (") 2c . 

where 2e denotes summation over all selections of (rji ,...,»?,) out of 

(1,..., n). is a symmetric function of the sample 
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components (ajj,..., *„). Now ..., which is assumed to 

be finite, has the same value, namely 0^, for all permutations (x^^,..., x^^ 
of the elements of the sample (xj.xj. Furthermore, it is evident that 

(8.2.16) • • •. *J) = .*f,)) = 

and also 


(8.2.17) 


<^(Q[r](*l. • • •. *«)) 



• •. * J) = 01 - 


Now consider the variance of , x^. We have 

(8.2.18) AQ) = mU^i ...., Xn)) ~ 01 

But 


(8.2.19) 


y^n)) = I J 2c • • • > » ^0) 


where rji,.,. ,ri^ and are two selections from the integers 

with rji < • • • < rjj. and < • • • < where ^c denotes 

summation over all pairs of such selections. It is evident that if ?yi,..., rj^ 
and ^ 1 ,.. ., is a pair of selections with j integers in common, then 
• • • > ^ri)goi^c^> •••> i^as the same value, say (pj, for every 
such pair of selections. It follows from Schwarz’ inequality that 9 ?,, 
y = 0, 1,..., r are all finite if 0^ and 0^ in (8.2.13) are finite. It is evident 


from combinatorial considerations that there are 


such 


pairs of selections, and j can take on the values 0 , 1 ,. .., r. Therefore 
we have 


(8.2.20) ..., X„)) = ijj) (" _ j)<P^ 

Thus, we obtain 




If ?(*{,. 

reduces to 
( 8 . 2 . 22 ) 


.., Xj^ is chosen so that = 0 , then since (p^ = flf, cr®( 2 [,j) 





a result due to Hoeffding (1948a). 
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Summarizing we have the following result: 

8.2.2 Suppose ..., a; J is a sample from an arbitrary distribution, and 
let g(x^y,, . yX^) be a function which has finite first and second 
moments 0^ and For a sample {x ^,. .., xJ, n > r, from the 
same distribution, let Qir](^i,..., xj be defined according to 
(8.2.15). Then , a; J is an unbiased estimator for 0^ 

which has variance given by ( 8 . 2 . 21 ). 

Hoeffding has shown that if and 0^ are finite, Og > the limiting 
distribution of Vn{Q[^^ — 6i)/or(0j^j), as ai -> oo, is iV(0, 1). He extends 
this result as well as 8 . 2.2 to the case where Q[r](^i ,...» a;,,) is a vector. 


8.3 SAMPLING THEORY OF SAMPLE SUMS AND MEANS 


(a) An Iterative Method 

We shall first consider an iterative method for finding the distribution 
function of a sample sum or mean. It furnishes a direct approach to the 
problem which is fairly straightforward in some specific cases. 

Consider the case of samples of size 2. If we let G^i^) be the c.d.f. of z, 
we have 

(8.3.1) GgCz) = P(xi + *2 < z) = f d(F(x^)F(x^)) 

Je 


where E is the event in for which % + X 2 < s. If we apply 3.7.2 for the 
case in which and x^ are independent, and put g(x^, x^ = 1 for all 
points in E and 0 for all points in E, we can write 


(8.3.2) 


I* diF(xi)F{x 2 )) = ( f ^dF{x^) dF{xi) = I F(z - * 1 ) dF(x^). 

JE J-ooLJ-oo -I J-co 


Hence, for samples of size 2, we have 

(8.3.3) 62 ( 2 ) =1 F{z- * 1 ) dF(*i). 

Extending the argument to samples of size n, and denoting the c.d.f. of 
z by G„(z), one finds similarly, that G„(z) can be expressed as an iterated 
integral as follows: ■ 


(8.3.4) 

G„(z) = 



«n-3 I Z-Xi 


/: 


-Xn-2 


F(z- Xi- -a;„_i) 

dF(x„.i) dF(x„_2) ■ ■ ■ dF(xj). 
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The c.d.f. of the sample mean a?, say HJxY is given by the relation 
s GJinx). Therefore, 

8.3.1 , a;J is a sample of size nfrom the c,d. f F{x\ then 

the c.df of the sample sum 2 , is given by (8.3.4), and HJyt), the 
c,d,f of the sample mean x is given by HJ^x) = Gn(nx), 

The c.d.f. GJz) is sometimes referred to as the convolution of the c.d.f.’s 
jF(xi), ..., F(x J. The process applies in a straightforward manner so as 
to produce the convolution of n independent random variables with 
arbitrary distributions Fi(x ^,..., FJix^. The convolution in this case, 
of course, is the c.d.f. of the sum of the n independent random variables. 

In the case of a sample from a population having a discrete distribution 
with p.f. p{x), it can be verified that the p.f. of 2 , say PrSf), satisfies the 
following equation: 

(8.3.4a) p„(*) = J,p(z- x)p„.iix) 

X 

where p^i^) = p(x). 

Similarly, if we have a sample from population having a continuous 
distribution with p.d.f. f(x\ the p.d.f. of 2 , say fjz), is given by 

(8.3.46) m = f jiz - x)f„.,ix) dx 

where fit) = fix). 

Example. An interesting special case of (8.3.4^) is that in which fix) is the 
p.d.f. of the rectangular distribution i?(i, 1). In this case we have 

2 < 0 
0 < z < 1 
z > 1 

z < 0 
0 < z < 1 
1) 1 < z < 2 

z > 2 



and by mathematical induction it follows that 


in-l)\ 


h-(i 


(* _ 1)»-1 + l2 l(z-2r-i - 


[-!)*(”)(? - 


for A: < z < A: 4* 1, A: = 0,1,..., w — l,f(z)=x 0 for 2 < 0 and z> n. The 
p.f. of X is seen to be nfinx) for k/n <t <ik l)//i, A: = 0, 1,. .., /i - 1. 
This result is due to Laplace (1814). 
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(b) Application of Characteristic Functions 

Characteristic functions furnish a simple and powerful method of 
determining sampling distribution functions of sums and means in many 
cases. 

If we refer to 5.3.2 it is evident that we have the following corollary to 
that theorem: 

8.3.2 ^(^ 1 ,. . ., is a sample from the c.d.f Fix) and if the characteristic 

function of F{x) is (f{t)^ then the characteristic function of the sample 
sum z is 

w(t)r 

and the characteristic function of the sample mean x is 



The c.d.f. (and p.d.f. in the continuous case) of 2 or ^ can be determined, 
theoretically at least, from their characteristic functions by the application 
of 5.1.2. In actual applications, except in certain special cases, the evalua¬ 
tion of the integral on the right of (5.1.14) or of (5.1.15) is complicated. 
Irwin (1930) has made extensive use of this technique in determining 
distributions of sample means from various distributions. However, in 
many important special cases which arise in sampling theory, we can 
determine the distribution functions of z and x by utilizing 5,3.3 concerning 
the reproductive property of c.d.f.’s. For suppose (a:j,..., xj is a sample 
from a c.d.f. F{x; 6) having characteristic function 7 (r; 0) . If 2 is the sample 
sum then we know from 8.3.2 that the characteristic function of z is 
[(p(t/ 6)]"*. But if (p(t; 0) satisfies the criterion of reproductivity with respect 
to 0 as expressed by (5.3.7) then 
(8.3.5) 0)r = (fO; ttO). 

This implies that z has the c.d.f. F(z; nd). By setting z = nx where x is 
the sample mean, it is seen that the c.d.f. of ^ is F(?jx; nd). 

We may summarize as follows: 

8,3.3 Suppose (^j, . . . , x^) is a sample from the c.df Fix; 0), and the 
characteristic function of F(x;0) is (f{t:0). Then if [7^(/; 0)]^^ = 
(lit; nO) the c.d.f, of the sampling distribution of the sample sum z 
and sample mean x are F(z; nO) and F{nx \ nO) respectively. 

It should be noted that an extension of this statement holds for sampling 
from a population having a A:-variate distribution. In this case 2 is a 

n 

vector (zj,..., z») where The parameter 0 can also be a vector 

with several components. 
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The following corollaries of 8.3.3 give information about the sampling 
theory of sample sums for certain important special distributions, which 
will be useful in later chapters. 

8.3.3a x„) is a sample from the binomial distribution Bi{m,p), 

the sampling distribution of z is the binomial distribution Biimn, p). 

For the characteristic function of the binomial distribution B(m,p) is 
(q + e'^pY^ from which it follows that the characteristic function of z is 
{q + and hence the sampling distribution of s is B{mn, p). Note 

that for m = 1, we are sampling from the binomial population having p.f. 

<*■«> III 

which means that the binomial distribution having p.f. (6.2.2) is essentially 

n 

the distribution of a sample sum 2 = 2 each being 0 or 1 with 
probabilities q or /?, respectively. 

8.3.3b If (a-je,. . ., f = 1, ...,/?) /.y « sample from the multinomial 
distribution M(m\ pi, ,.,, pj^, the sampling distribution of the 
vector of sample sums (z^, . .., is the multinomial distribution 
M{mn\ 

The verification of this statement is similar to that for 8.3.3a. Note 
that for w = I, the multinomial distribution M{n\ /?i,. . . is itself a 
sampling distribution of sample sums from the ^-variate multinomial 
distribution 71/(1; . ,p^ having p.f. 

(8.3.7) .... a:,) = • • • p'Xl -Px -- 

where are random variables such that + • • • + < 1, each 

X being 0 or 1. 

8.3.3c If {x ^,. . ., a: J is a sample from the Poisson distribution Po(p), the 
sampling distribution of z is the Poisson distribution Po{np), 

For the characteristic function of the Poisson distribution Po{p) is 
which it follows that the characteristic function of z is 
e"”/'(i But this is the characteristic function of the Poisson distribution 
Po{np)y which implies that z has the Poisson distribution Po {np). 

8.3.3d If (a?!,.. ., a:„) is a sample from the normal distribution N(p, a^) 
the sampling distribution of z is N{npy nd^). The sampling dis¬ 
tribution of X is N{p, a^jn). 

For we recall from Section 7.2 that the characteristic function of the 
normal distribution N(p, is whereas that of z is 
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But the latter is the characteristic function of N{nfi, wcr^), which is therefore 
the sampling distribution of z. The characteristic function of x is obtained 
by replacing t by tjn in the characteristic function of z, which gives 
This is the characteristic function of NQj., a^jn), which is there¬ 
fore the sampling distribution of x. 

The corresponding statement for the A:-variate normal distribution is: 

8.3.3e If (iTjt,. . ., | = 1,. .., /z) is a sample from the k-variate 

normal distribution l|o',J), the sampling distribution of the 

vector of sample sums (^i,..., 2 *) is Wna^^W). The 

sampling distribution of the vector of sample means (x^,, ,, ,Xj^) is 

The verification of 8.3.3e is similar to that for 8.3.3d and is left as an 
exercise for the reader. 

Finally, we have the following further corollary of 8.3.3 for the gamma 
distribution which is left to the reader for verification: 


8.3.3f If (xj,. . ., a;„) is a sample from the gamma distribution G{p) the 
sampling distribution of z is the gamma distribution G(nfi), 

If the distribution function of the population from which the sample 
(a?!,.. ., a; J is drawn is not reproductive, the characteristic function is not 
very useful, in general, for finding the distribution function of the sample 
sum or mean. 

Sometimes we have to deal with linear functions of means of samples 
from normal populations. The following statement gives a useful result 
concerning the distribution of such a linear function: 


8.3.4 


Suppose (a:ii.a:,„^),. .., (a:^,„ ..., are k {independent) 

samples from N{n^, aj),. .., respectively. Let x^^ , , . ,Xj^ 

be means of these samples, respectively, and let Ci, , .. , c^ be 
constants, not all zero. Then c, + * * * + has as its sampling 


distribution N 






Verification of this statement by using characteristic functions is 
straightforward and is left as an exercise for the reader. Note particularly 
that if q == 1, q = — 1, and q = • • • = q = 0, we obtain the corollary 
that 


8.3.4a The sampling distribution of the difference q x^between the means 
of two {independent) samples (qi,..., and (xgi,. .., ^ 2 ^^) 
from N{iJi^, af) andN^p^y <^ 2 ) respectively, isNQii — p^y^Vni + 
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8.4 SAMPLING THEORY OF CERTAIN QUADRATIC FORMS 
IN SAMPLES FROM A NORMAL DISTRIBUTION 

The essential facts concerning the sampling theory of sums and means 
of samples from normal distributions have been given in 8.3.3d, 8.3.3e, 
8.3.4, and 8.3.4a. The present section will deal with the sampling theory 
of sums of squares, sample variances, and other quadratic forms of 
samples from one-dimensional normal distributions. The corresponding 
sampling theory for samples from A:-variate normal distributions is more 
complicated; it belongs to normal multivariate analysis and will be 
treated in Chapter 18. 

One of the basic results in the sampling theory of sums of squares of 
sample elements is the following: 

8.4.1 ...,xj is a sample from the normal distribution Ar(//, a^) the 

1 ^ 

sampling distribution of the sum of squares -g 2 is the 

chi-square distribution C{n), ^ 

To verify this statement it is sufficient to note that if the p.d.f. of the 

1 " 

normal distribution N{p, a^) is inserted in (8.1.3) then 

cr ^=1 

is the quadratic form which appears in the exponent of the p.d.f. of the 
sample. It follows from 7.8.2 that this quadratic form has the chi-square 
.distribution C(n). The result given in 8.4.1 was originally obtained by 
Helmert (1876^). 

The sum of squares which appears in the definition (8.2.6) of the sample 

n 

variance s^ is ^ “■ where x is the sample mean. The sampling 

theory of this sum of squares in samples from a normal distribution can be 
stated as follows: 

8.4.2 If {xi,,, , ,x^) is a sample from the normal distribution Ni/n, a^), 
then Vn{x — p)la and (n — l)s^la^ are {statistically) independent 
and their sampling distributions are the normal distribution N{0, 1) 
and the chi-square distribution C{n — 1) respectively. 

We have already seen from 8.3.3d that x has the distribution N{(jl, a^jn), 
which is equivalent to the statement that Vn{x - ^)/(t has the distribution 
N{0, 1). To establish 8.4.2 it remains to show that Vn{x — p)la and 
(n — 1)5*/(T* are independent and that the latter has the chi-square 
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distribution C(n — 1). To do this, let us set up the characteristic function 
of these two quantities, namely, 

(8.4.1) 

<P(h> t2) = «^[exp -fi)^ + it ^j J 

“ (vs)"!.””(" .-'of ]) 

• dxi • • * dx^ 

where 

in Oit ^ 

(8.4.2) Q^{xi, ...,*„) = "5 2 ^ 2 (^{ - *)*• 

cr ^=1 (T ^=1 


Putting 


(8.4.3) 


1 - 2ff,( 


f = 1,. .., n 

L 

n/J 




_ 2ih 
” 2 ’ 

f ^ = 1, 

..., n 



n(r 



we have 





(8.4.4) 

Qo(^i> • 

• •.««) = 

2 ““ 

i«)(^, - /“)• 




By using (7.4.15), we note that 

(8.4.5) 

Qo(Xi, ...,*„)- —^(S - fiyn = Q;,(xi .a;J +/ 2 Tj ] Ji¬ 
ff /nr 

where 

(8.4.6) Qoi^i, •••.*«) = Qo(*i - lgu--^^n- igj 
and 

= gi = j^Ipn .”• 

It will be observed that the matrix ||t^’^|| is of the form 

a d • • • b 

(8.4.7) b a-- b 

b 6 • • • a 
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which has (a — 6 )"~^[a + (« — l)h] as its determinant, and 


(8.4.8) 


as its inverse, where 

a + (n — 2)b 


A 


B 

B 

A* • • 

B 

B 

B-.. 

A 


(8.4.9) A = 

(a- fe)[a + (« - 1 ) 6 ] 
Therefore, we have 

4 - 2 ,^) 


(8.4.10) 






(1 - 2it^) 


B = 


(a - b)la + (n - l)b] 


and 

(8.4.11) 


(1 - m 


f = 1. n 


f #»?= 1. n 


2 Tj, = n<T®. 


Substituting from (8.4.1 1 ) into (8.4.5) and putting the resulting expression 
on the right of (8.4.5) in the integral in (8.4.1), we obtain 

(8.4.12) <p(t^, Q = • (- 7 ^)" f .• • • dx„. 

Following the same method by which it was shown that = 1 in (7.4.16), 
it can be seen that the integral in (8.4.12) has the value VBut 


Therefore, we have 


(8.4.13) 

^ 2 ) — ‘ 

where 


(8.4.14) 

9’i(ti) = 

and 


(8.4.15) 

VziQ = (1 - 2 »t,)-‘<»-«, 



Sec. 8.4 


SAMPLING THEORY 


211 


Since factors in the manner indicated in (8.4.13), it follows from 

5.3.1 that Vnix — //)/(7 and {n — \)s^ld^ are independent, their charac¬ 
teristic functions (8.4.14) and (8.4.15) being those of the distribution 
1) (see 7.2.1) and chi-square distribution C{n — 1) [see (7.8.2)] 
respectively. These distributions are uniquely determined by their 
characteristic functions, thus completing the argument for 8 . 4 . 2 . 

The following important result follows from 8.4.2 and Section 7.8(6): 

8.4.3 If (^ 1 , . . . , .r„) is a sample from N{fx, then 

(8.4.16) t = Vn(^x — p)js 

has the Student" distribution S(n — 1). 

For if we let u = Vn(x — //)/or and v = (n — \)s^la^ in (7.8.3) we have 
u 

t = --;==== = V nix — ju)ls. But if the sample comes from Ni/u, 

V vl{n — 1) 

we know by 8.4.2 that u and r are independent and have as their distri¬ 
butions A^(0, 1) and the chi-square distribution C{n — I), respectively. 
Hence, applying 7.8.3 we obtain 8.4.3. 

Remark. The result stated in 8.4.3 was first surmised by W. S. Cosset (1908) 
writing under the name “Student'’. This result, as well as that stated in 8.4.2, 
was later verified by Fisher ( 1926 ^), who attached the name “Student” to 
the ratio defined by ( 8 . 4 . 16 ) as well as to the more general distribution having 
p.d.f. ( 7 . 8 . 4 ). The fact that v has the chi-square distribution with n — 1 degrees 
of freedom was first established by Helmert ( 18766 ). 

It should be remarked that the converse of 8.4.2 is also true, which, 
stated in its essential form, is that if x and s^ are the mean and variance of 
a sample of size /i, from a p.d.f,/(.r), and if x and 5 ^ are independent then 
f(x) is the p.d.f. of a normal distribution. This result was originally 
obtained by Geary (1936), and more recently by Kawata and Sakamoto 
(1949), and by Lukacs (1942). See Problem 8.33. 

More basically Cramer (1936) has shown that if and are independent 
random variables whose sum has a normal distribution, then x^ and 
each has a normal distribution. This result extends by induction to the 
case of several random variables. 

The important feature of t as defined by (8.4.16) for statistical inference, 
as we shall see in later chapters, is that [assuming the sample comes from 
the distribution N(ju, cr^)] neither it nor its distribution depends on 

Again, let us recall that the p.d.f. of a sample (x ^,. . , ,xj from a^) 

1 

has the sum of squares 0 = —^ 2 exponent and we know 
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from 8.4.1 that Q has the chi-square distribution C(n). Note that Q can 
be decomposed as follows: 

(8.4.17) e * Cl + e. 
where 

(8.4.18) Cl = 3 i (*f - e, = 3 (* ^ /*)*. 

cr cr 

We have seen that Vn(« — /i)/(r has distribution JV(0,1), and it follows 
from 7a8.2a that [Vn{x — that is, gg* lias the chi-square distribution 

C(l), Since Qi and y/r0 — fi)la are independent, so are Qi and gg* 
Thus g, which has the chi-square distribution C(/t), has been decomposed 
into two components g^ and gg which are independent and have chi- 
square distributions C(n — 1) and C(l), respectively. This leads to the 
question whether there are conditions under which the sum of squares 
appearing in the exponent of the sample p.d.f. can be decomposed into 
several independent components having chi-square distributions if the 
sample comes from a normal distribution. An answer is given in the 
following theorem due to Cochran (1934). It is sufficient to consider 
the case of sampling from the normal distribution iV(0,1). 

8.4»4 If (a?!,..., a;J is a sample from the distribution iV(0,1) and if 

ft k 

2 s 2 g<, where g< is a non-negative quadratic form i/i ..., 

whose matrix has rank n^, a necessary and sufficient condition for 
the Qi to be independently distributed according to chi-square 

k 

distributions C(/i^), i *= 1,..., fc, w that 2 «< = n. 

First, let us show that the condition is necessary. Thus we assume that 
the Qi are independently distributed according to chi-square distributions 

k 

C(ii,), i l,... ,k and we must show that this implies that2n, *= n. 

i-i 

But it follows from the reproductive property of the chi-square distri¬ 
bution that Cl + ■ • • + Cfchasth®clU’S<luar*<l*stril>*>tionC(«i -|- * —h «*). 

n 

But Cl + * *' + Ck B 2 ^ since the sample is from iV(0,1) it 
follows from 8.4.1 that T has the chi-square distribution C(n), which 

<-i t 

must therefore be identical with Ojix + ■ ■ * + /ti). Hence ^ >■ n. 

<-i 

To establish the sufficiency of the condition stated in 8.4.4 we assume 
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that «i + • • • + /ijt =s /i, and we must show that there exists a nonsingular 
linear transformation 

(8.4.19) ^( = 1, hiVi’ ^ = 1.« 

»»=1 

which transforms 2 i,..., 0* as follows: 

Gi = y? + • • • + y*, 

(8.4.20) es = <.i + --- + y*..„. 

Qk ~ ySjH-+n4_i+l + ' ■ ' + yn,+ +»!* 

where ni + • • • + n^ = n, and where . y„ are n independent 

random variables all having the distribution A^(0,1). For if this can be 
done, such a transformation will transform the p.e. of (x^,... ,x„) to 
the p.e. of a sample (j/j ,... ,y„) from the distribution JV(0, 1). Thus, 

Cl. • • •» 0*. being sums of squares of .mutually exclusive and 

independent random variables all having distributions N{0, 1), would be 
independent and by 8.4.1 would have chi-square distributions C{ni ),..., 
C(«t), respectively. 

Since Qi is a non-negative quadratic form with matrix of rank rti there 

exist rii linearly independent linear combinations of . x„ which we 

may denote by ..., y„^ (see Bdcher (1907) or Birkhoff and MacLane 
(1953), for example), namely 

(8.4.21) yjj «= 2 1."i 

such that 

(8.4.22) Cl = y? + • • • + y*n.. 

•Similarly for Qj.Q* we have the following sets of . .. 

linearly independent linear combinations of x^,..., x„ respectively: 

(8.4.23) fg = "i + • • •»^*1 + '*2 


= 2 fife = 'll + • * * + "ifc-i + 1, • • ‘, 'll 

> 1=1 

+-h w*-i + n*. 

We have shown that no linear dependence can exist among the y’s 
associated with a single Q. We must next show that there can be no linear 
dependence among y’s associated with two or more different 0’s. If this 
were the case it would be possible to express a y associated with one 0, 
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say S*, as a homogeneous linear combination of y's from one or more 
Q's different from Q*. This means that 6i + • • • + C* is expressible as a 
non-negative quadratic form in at most n — 1 y’s. Since each ^ is a 
homogeneous linear combination of ..., + • • • + 2* would 

then te a non-negative quadratic form in the x’s whose matrix would be 
at most of rank « — 1. But we know that the matrix of 2^ -1- • • • Q*, 

n 

being equal to that of 2 is of rank n. Hence, it is not possible to have 
1 

linear dependence between the y’s associated with one Q, say Q*, and 
those associated with one or more Q’s different from Q*. Therefore the 
matrix 

11**” II, f,n = l.n 

is nonsingular. Hence the transformation 

(8.4.23) = 2 f = l,...,n 

is non-singular and has a unique inverse 

(8.4.24) r)= I,..., n, 

t-i 

n 

which being a linear transformation, transforms 2 hence Qi + 

1 

-h Qfc, into 

(yf 4- • • • + vl) + + • • • + 2/ni+n,) + • • • 

+ (yni+-+n»_i + l + • • • + y'n)- 

The transformation is seen to be orthogonal and hence the y’s are inde¬ 
pendent random variables all having the distribution A^(0, 1). Therefore, 
Qi, • •. * Q* have distributions which are identical with those of (yf + 

• • • + • • • >( 3 ^ 1 +• •+ • • • + ^n) respectively, which in turn are 

independent random variables with chi-square distributions C(ni), ..., 
C{n^ respectively. This completes the proof of the sufficiency condition 
and concludes the proof of 8.4.4. 

Cochran’s theorem has important applications in analysis of variance 
and normal regression theory as we shall see in Chapter 10. 

8.5 SAMPLING FROM A FINITE POPULATION 
(a) The One-Dimensional Case 

In the definition of a random sample ..., given in Section 8.1, 
the elements of the sample were defined as independent random variables 
all having the same c.d.f. F{x). The theory of sampling based on this 
definition is sometimes referred to as simple random sampling from an 
infinite population where the population has F{x) as its c.d.f. This 
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terminology is suggested by the fact that since the sample elements 
.. ., are regarded as a sequence of independent random variables 
all having the same distribution as that of the population, we may think 
of . . ., as describing the results of successive “drawings” from the 
population, the population distribution remaining unchanged throughout 
the “drawings.” Unless otherwise stated, “a sample of size «” will always 
refer to sampling from an infinite population. 

There is also a theory of simple random sampling from a finite population 
77 v» in which the population tt y consists of a finite number of N objects, 

. . . , Oy, If we regard tt y as a basic sample space with o^,, . . ,Oy 
as sample points and if we let x{o) be a random variable defined on each 
of these sample points, so that ir(o^) = t = 1 ,. . ., A^, then x(o) maps 

6 ^ 1 , ... , Oy onto points .:r^v in ^otc that x^i, . . . , x^y may 

be regarded as simply the :r-values of the N objects in 77 y. It is convenient, 
and there is no loss of generality, if we assume that the objects in TTy are 
labeled so that • • • x^y. If we assign equal probabilities 1/^, 

to Oj, . . . , Oy, and if we denote the random variable x(o) by x, and its p.f. 
by then the mass points (that is, sample space) of x are . .., x^y 
and 

(8.5.1) p(x) = 1 

N 

at each of these mass points. In case x{o) has the same value for two or 
more elements from 77 y, then p(x) will be a multiple of l/7Vat that value of 
x{o). More specifically if a:(c^) = x' for r elements from 77y, then p{x') = r/N. 
We may think of x as the random variable denoting the x-value of a single 
object “drawn at random” from iTy, It is convenient to call p{x) thus 
defined the p.f. of the finite population. 

Now suppose . . . , o,,^) is some permutation of n of the N objects 
in TTy. There are N\I{N — n)\ such n-permutations. Now consider these 
^-permutations as the sample points in a basic sample space R. If 
(o , ,. . . , Oy^ is a sample point e in R, let Xi{eX ..., x^(e) be random vari¬ 
ables such that x^{e) = - x„(e) = Denoting .. . , xjje)) 

by (.rj,. . . , we shall refer to this w-dimensional random variable as a 
sample of size n from the finite population Try. We may think of x^ as the 
random variable denoting the a:-value of the first object “drawn at random” 
from 77 y, ar^ that for the second object,... a:„, that for the nth object—f/ie 
objects being drawn from rry without replacement. 

It is evident that (arj,. . . , a:J is a discrete 7z-dimcnsional random variable. 
If we assign all sample points e in R the same probability, namely, 

{N - n)! 


(8.5.2) 


N\ 
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then p{xi, .... itj, the p.f. of (*i,..., x^, is a multiple of (JV — n)!/JV! 
at each mass point. 

Actually, it is convenient in the discussion which follows if we consider 
the x-values of the N objects in ir^ to be distinct and the objects labelled 
so that *01 < • • • < x^jf. Then the mass points of the random variable x^ 
are *oi> • • •. *ojy> ^ — 1, • • •, «• Let be the real axis corresponding 
to random variable and let denote the set of points Xqii • • • j 
in B!'^. Let E„ be the Cartesian product x • • • x and E„ the 
Cartesian product x • • • x Then the mass points of (x^, 
form a set of points E^ consisting of those points in E„ for which no two 
coordinates are equal. There are exactly N!/(N — «)! points in E^ and 
there is a one-to-one correspondence between the points in E^ and the 
e^points in E. In other words the «-dimensional random variable 
(xi(e ),..., x„(e)) maps the e-points of B into the points E^ in B„. Thus, 
the p.f. of (*i,..., is defined at the points in its sample space B„ as 
follows: 

(8.5.3) p(x, .*„) = 7^ • • • # 

= 0, otherwise. 

The mass point of (z^,... ,zj corresponding to the objects ..., 
is (* 0 y,,.... x^oyj. 

Note that the marginal p.f. of (* 1 ,..., x„_i) in (8.5.3), that is, the sum 
of/K*i, ...,*„) with respect to x„, gives 

Pi --.... *n-i) = (N - n + 1) — j~‘ 

^ (N -n + 1)! 

Nl 

at each point of E[^^ x • •. x for which the « — 1 coordinates are 

all different while pi ......, a:„_i) is zero at all other points in 

£^1 X ... X More generally, the marginal p.f. of ..., x^^ 

is given by 

(8.5.4) . 

at those points of x •. • x whose coordinates are all different, 
and zero at all other points in E{^^^ x • • • x E^^'K The p.f. of any single 
component of (* 1 ,..., xj, say X(, is given by (8.5.1), that is, 

(8.5.5) /»«(*#) = 

the mass points of Xf being * 01 ,..., Xg^. Thus, any component of 
(*i,..., has the same distribution as the population distribution. 
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The mean fijg and variance cr|^ of the finite population will be defined 

as follows: 

(8.5.6) =s — 2 0 % = - 2 (% — /ijv)®. 

N iV — 1 

We shall ordinarily drop N and refer io and a% as fi and a*. Further¬ 
more, it is obvious that for f = 1 ,..., n 

(8.5.7) = (i, a®. 

It is worthwhile to examine the marginal p.f. (8.5.4) for r = 2, that is, 
the p.f. of any two components (*f, *,) of (* 1 ,..., *„). We find 

(8.5.8) Pf»(*f» *») =---, 

the mass points being (*„„ x^^), t ^ t' = . ,N. The covariance 

between and x^ is given by 

(8.5.9) cov (a:j, a:,) = 2 (arj - /i)(a:, - *,) 

^ y {Xpt - 

tJ'-i N{N - 1) 

•<=i N(N - 1) ^ N(iV - ”* 

Making use of (8.5.6) we find that 

2 

(8.5.10) cov (a;j, *,) = - ^ . 

It will be noted that the correlation coefficient between x^ and x^ has the 
value 

(8.5.11) 

It will be recalled from (8.2.4) that cov (x^, = 0 in the case of a sample 

from an infinite population. Summarizing: 

8.5.1 If •, xj is a sample of size n from a finite population tt^, then 
S{x^) = /X, <T*(a;j) = ^ <r®, | = 1,..., n 

cov (Xf, X,) = - - , p{x^, *,) = - ’ 

where n and <r* are defined by (8.5.6). 


{ = 1.n. 
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The problem of determining the sampling theory of any function 
gi^ii • • • 9 ^n) of a sample ,a:„) from the finite population 

simply amounts to finding the distribution function of , xj, 

where (x ^,..., is a random variable having p.f. (8.5.3). The reader 
should note that the preceding two pages of results hold with only minor 
changes if some of the <’s in < • • • < v are replaced by =’s. 

(b) Mean Values of Sample Mean and Variance 

For illustrative purposes we shall first consider two particular functions 

1 " 

of (xi, ..., a:„), namely, x and s^. Since x = - ^x.v/c have 

n {=1 

(8.5.12) ^(x) = s(- 2 Xf) = - 2 

\n ^ / n ^ 

But S'{x^) = //,! = 1,..., n. Hence 

(8.5.13) 

Thus, X is an unbiased estimator for fi. 

For a\x) we have 

a\x) = S'ix — /i)* =5 2 — /^) 

Ln § J 

= -12 + - (“)• 

n‘ I n f#i( 

N — 1 

But 6(xi — (x)^ = —ff®, f = 1,...,and for f ^ 

^2 

- juX», - /«) = cov (Xj, X,) = - - . 

Hence 

(8.5.14) or®(x)=(l-l)a®. 

Now let us consider the mean value of s^. We have 
s® = ^2(*t-*)" 

n - 1 j 

= - 2 ^ ;; I (^( - 

n j n(n - 1) j#. 

Taking mean values, and using 8.5.1, we find that 

(8.5.15) <^(s®) = 

Hence, 5 ® is an unbiased estimator for u®. 
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Suihmarizing, we have the following result: 

8.5.2 Suppose (x^, is a sample from a finite population Vff having 

mean p and variance <t®. Then we have 

^(x) = p, <r*(x) = (1 _ iU. ^(s*) = 

\n NJ 

It should be noted that the mean and variance of the sample sum are 
given by 

(8.5.16) ^(z) = np ff*(z) = n^(- - - 

\n N 

Finally, it should be observed that if and a%, as defined in (8.5.6), 
which are functions of ..., are such that they have finite limits 
as TV 00 , which can be denoted by fji and then it will be seen that the 
limits of and a\x) in 8.5.2 as oo are identical with the values of 
^(x) and a\x) for samples from infinite populations as given by 8.2.1. 

(c) Mean Values of Certain Symmetric Functions of a Sample from a 
Finite Population 

The basic ideas in Section 8.5(T^) can be extended to obtain mean values 
of certain general symmetric functions of the sample. Thus, suppose 
(a?!,..., xj is a sample from a finite population consider a 

function ... , ), fj,..., f,. all being different, r < n. Then it 
is evident from the form of the p.f. of (x^^,..., x^^ given by (8.5.4) that 

(8.5.17) <^[g(Xf,,.... Xj,)] = 2; •.., 

Ni 

where 2^' denotes summation over all values of fi,..., tr which are 
all different and where each t ranges over the integers 1,..., V. Now 
let us form the following symmetric function of (x^,..., xj: 

(8.5.18) C(*i.. • •, a;„) = • • • • %)- 

ni 

where 2^ has a meaning for ..., fsimilar to that of for /i,..., 

If we take the mean value of Q{x ^,..., x^), we have 

(8.5.19) ^(Q(x„ .... xJ) = • • •. 

n! 

But ..., x^^] has the same value for all values of 7 ^= • • • 

9 ^ f,., namely, that given by the right-hand side of (8.5.17). Since there 
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are n!/(/i — r)! such sets of values of , f,, we obtain 

(8.5.20) 

^(Q(Xi, .... *„)) = ^ • • • > ® 0 <r) ™ 6 (* 01 > • • • > ® 6 iv)» 

Ni 

that is, Q(xi, ...,»„) is an unbiased estimator for Q(X(,i . x^^). 

Therefore, we have the following result: 

8.5.3 If (xi ,..., XJ is a sample from a finite population and 

if Qixy,... ,x^ is any symmetric function of (xj,..., x J of 
form (8.5.18), then QC^i,..., x^ is an unbiased estimator for 
fi(*<,i.®ojv)- 

Furthermore, it follows from 3.2.1 and 3.2.2 that 

8.5.4 If Qiixi,, x„),..., g,(xi,..., x^ are any s symmetric functions 

of type (8.5.18) of a sample (x^,..., x„) from a finite population ttj^, 
then, for any set of constants d^,...,d„ diQfx ^,... , x J + • • • 
d$Q,(Ph> .•.,x,^is an unbiased estimator of • • •. ^on) + 

• * • + d,Q,(x^i .x,jv). 

If in (8.5.18) we take g(x,^,..., x^) s x|i • • • x|;, where Cj.c, 

are positive integers (or zero), then we obtain a class of symmetric func¬ 
tions {G(^i.such that the sample mean, sample variance, 

A;-statistics, and other polynomials in sample moments can be expressed 
as linear functions of G’s in this class. This fact together with theorems 
8.5J and 8.5.4 make the sampling theory of means, variances, higher 
moments, and especially Ar-statistics quite simple as Tukey (1950, 1956a) 
has shown. 

Example. To illustrate let us determine the value of /(£*). 


Now ^ X 

« Lii-i ‘ ‘ J 

“ ^ • • • * *n) + ^2-• • • 1 *n) 

(/i-l)(n-2) 

+-• • • > ®n)» 

where 

a- M.-lie-2) 


and which are functions of type (8.5.18). 
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Applying 8.5.4 we therefore have 

/(**) -■ ^ Oi(*oi. Xgif) + ^ Ca(^oi> • • •. <»on) 




. (« - IXn - 2) ^ .. , 

H-3-Q3(*oi .**2r). 


If we let 
we note that 

Cl(*ol. • • •. *ow) - A«» 
CeC^oi. • • •. ®p2r) = 




1 


TV - 1 


(Npf'aMi - /**) 


Qa(-Ooi .*oJr) “ (jv ' Z ' i)(;y2) 


and hence can be expressed as a polynomial in the first three population 
moments. 


(d) The A:-Dimeiisional Case 

Now consider the situation in which x{o), referred to in Section 8.5 (a), 
is a vector with k components, say Xi(o)y ..., Xj^o). Then each object 
in 7T^ is “measured” by k numbers. Thus, if x^(Of) = x^^^, we may refer 
to x^^^ as the x^ value of and if the vectors for the N objects in 
are all different, our vector random variable (x^(o\ ..., Xj^(o)) maps the 
objects in into N points in the coordinates of the point into which 
Ot is mapped being ..., x^j^^), r = 1,..., TV. If the probabilities 
associated with all of these TV points are assigned equal values, namely,!/^, 
then we have a /^-dimensional random variable (x ^,..., x*) whose p.f. is 

(8.5.21) p(x^, ...,*») = ! 

TV 

the mass points being ..., x^j^^), / = 1,..., TV. We may regard 
p(xj ^,..., as the p.f. of the finite population tt^. 

Now consider the basic sample space R of the .^!/(TV — n)l /i-permuta- 
tions referred to in Section 8.5(a) in which (o ^^,..., is a typical 
sample point e. Let (x^i(e), ..., x^^(e); i = 1, ,.., k) be nk random 
variables so that x^i(e) = x ^^^,..., = x^^^. If we denote 

(a;^i(e),..., xje); i = 1,.. ., /:) by (x^^, i = 1,. . .", /:, f = 1 ,..., /i), 
we shall refer to this n/:-dimensional random variable as a sample of size 
n from the k-variate finite population ir^ having p.f. (8.5.3). There will be 
TV!/(TV — n)l mass points in of this n/:-dimensional random variable, 
at each of which the assigned probability will be (TV — n)!/TV!. The mass 
point corresponding to (Cy^,..., OyJ is {x^y ^,..., x^iy^, i = 1,..., /:). 
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The preceding setup is sufficient to provide the sampling theory of 
functions of our sample, the nfc-dimensional random variable 
/ = 1,..., fc; f = 1,..., n), such as the vectors of sample sums and 
means 

(8.5.22) 

and the sample covariance matrix where 

(8.5.23) 

1 w 

Sii =-7 2 (^if - - x;) /,; == 1,. . ., k. 

n — 1^=1 

If we define the vector of means (pi,.,., p^) of as 

(8.5.24) = 

N < = 1 

and the covariance matrix Ho-j J of tt- as 

(8.5.25) 

1 ^ 

On - -7 1 (^oii - /^M^ou - flj), 1,7 = 1 . k, 

N — 1 i=i 

then it can be shown by mean value operations similar to those by which 
(8.5.12) through (8.5.15) were found that 

(8.5.26) Six,) = ix„ o\x) = _ Ij a.,, / = 1,..., fc 

COV (S„ Xf) = (- - i I ^ y = 1. k 

\n N/ 

SiSii) = aii, i, j k. 


8.6 MATRIX SAMPLING 


(a) Second-Order Matrix Samples 

Suppose u and v are independent random variables having c.d.f.’s 
Fi{u) and ^2(1;). Let x{u, v) be a random variable for which 

p = S^(x(u, v)) 




= I x(u, v) dF^iv) 

•/-GO 

= f xiu, v) dFyiu). 

•/—GO 


( 8 . 6 . 1 ) 
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Furthermore, let and be random variables defined as follows: 


( 8 . 6 . 2 ) €.^ = ^ 

^uv = v) - + fX, 

Then it is evident that the mean of each of these random variables is 
zero and also the covariance between each pair is zero. 

Thus, we have the following decomposition theorem: 


8.6.1 If v) is a random variable where u and v are independent random 
variables^ then x{u, v) can be decomposed as follows: 

(8.6.3) x{u, v)=^ p + e^. + e.,, + 

where e,,, and have zero means and zero covariances and are 
defined by (8.6.2). Furthermore^ 

(8.6.4) or®(a;(u, v)) = (t*(c„.) + 

Remark. It should be noted that a more general form of this theorem (and 
subsequent theorems of this section) holds if u and v are considered as independent 
sample points in arbitrary sample spaces and rather than random 
variables. In such spaces we would, of course, have probability measures and 
x{u, y) would be a random variable measurable with respect to the Cartesian 
product of these two probability measures. 


It will be convenient to abbreviate the notation for the variances as 
follows: 

o\x(u, v)) = 


(8.6.5) 


= 0*0 
o*(e.,) = og. 
oV„,) = o*. 


Then (8.6.4) becomes 

(8.6.4a) (T2 = ( 72 o + 4+a2. 

Now let (mi, ..., Ur) and (i^j,..., i^«) be independent samples from 
c.d.f.’s Fi(u) and F^iv) respectively. The c.d.f. of these two samples is 
therefore 


(8.6.6) TTwnw- 

{=1 1=1 

Let (x(Uf, t>,); f = 1 ,..., r, >; = 1,..., j) be a new set of random 
variables, which may be regatded as an r x s' rectangular array having r 
rows and s columns. It will be called a second-order matrix sample, but 
it should be noted that this sample depends on only r + s random 
variables, namely, and v,. 
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Remaik. Matrix samples arc fundamental in such fields as psychometrics and 
the design of experiments. In psychometrics, for instance, we may think of rows 
as representing a given population of examinees who take a certain test, and 
columns representing a given population of questions. Thus x(u^, v,^) would 
represent the score of the person for whom u « on the question for which 
V * v,j. Then for a test of s questions given to a group of r examinees, the scores 
of the r examinees on the s questions would constitute the matrix sample 
ap(u^, v^), ( ^ 1,,.., r, ^ 1,.,,, s. Lord (1955) has considered the sampling 
theory of various linear and quadratic functions of the x(u(, v^j) which arise in 
psychometrics. 

In a particular experimental design, rows may represent a population of 
operators of a certain kind of machine, and columns may represent a population 
of these machines. Thus, if in an experiment we pick r operators and s machines 
at random, we might let x(u(, v,j) be a random variable representing the output 
in an 8 -hour day of the 5th operator on the 77 th machine. Such an experiment 
would, of course, require s 8 -hour days to perform. Applications of matrix 
sampling in the design of experiments will be discussed in Sections 10.6 and 10.7. 


To shorten our notation, let 

^(<1 = *<«!> 

(8.6.7) 

rs 

1 V 

s , 

*•, =;2*f, 


where f = 1. r, rj = I,. .. ,s. 

It should be noted that for each f and rj, (8.6.3) takes a form which we 
may write as 

(8.6.8) = + Sf. + e., + Cf,. 

Now let 

Sh) = 2 (*|. - * •)® 

(8.6.9) 

So- = 2 (*•, - *••)* 

iffi 

s.. - 2 (*t, - *#• - + *••)* 

where it will be seen that 


Sy * + Sg. + S... 
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It is sometimes convenient to refer to S.^, Sq. and S.. as the row, column, 
and residual components of the total sum of squares Sj.. We shall consider 
the mean and variance of x.., and the means of Sj,, S.q, Sq. and S.., all 
mean values being taken with respect to the c.d.f. in (8.6.6). Since 

I = 1 ,..., r; »? = 1,..., s, 

it is evident that 

^(X..) — fl. 

In determining mean values of the 5’s in (8.6.9) we find in each case 
that the mean value can be expressed as a linear function of terms of the 
following four types: 

(8.6.10) i^(x>,x’^.,-) = a^ + a.^ + f 

= <4. f = i'y 

= < 4 -, »? = »?' 

= 0, f ^ f',»? 7 ^ rj'. 

Working out the mean values, we find 

<r(S.o) = sir - l)(<T®o + 

(8.6.11) ^(So.) = K.v - 1)(4 + 

^(S..) = (r - 1)(5 - 

while ^(5^) can be determined from the equation 

Similarly, the variance of x,. is a linear function of the four types of 
mean values mentioned in (8.6.10), which turns out to be 

(8.6.12) = ^ + ^■ + — . 

r s rs 

Summarizing, 

8.6.2 Let (wi,..., and (v^,... be independent samples from in¬ 
finite populations. Let (x^^; f = 1,..., r; rj = , s) be a 

matrix sample whose mean x.. is defined by (8.6.7), and whose roWy 
columny and residual components S.q, Sq., and S.. respectivelyy of the 
total sum of squares are defined by (8.6.9). Then <?(^..) = p 
and <T*(x..) is given by (8.6.12), whereas the mean values of S.Qy Sq., 
and S.. are given by (8.6.11). 



226 


MATHEMATICAL STATISTICS 


(b) Case of Finite Row and Column Popuiations 

The preceding results can be carried out in a straightforward manner 
for the case where (u ^,and {v^,... ,v,) are independent random 
samples from finite populations nif and This case has been con¬ 
sidered by Hooke (1956a) and Tukey (1950). 

In the case of finite populations we may, without loss of generality, 
consider the elements in as having distinct values of u, namely. 
Mol < • • • < Similarly, we may take f,,! < ’ ‘< '’ojv, distinct 
values of v. It will be convenient to use the notation 


(8.6.13) 


<H = ^oU - H 


and corresponding to (8.6.1), 




(8.6.14) 


^ ^ 2 ^oii 
NiN2 i,i 

1 


— 77 2 
s 

1 V 

Mf i — ~ 2, ^oii- 
Ni i 

For the case of finite populations of rows and columns, we achieve 
some simplification of mean values of ^.q, Sq., S.. by defining aj, of.^, 
OfQ., aj.. as follows; 

' 2 4 


< 7 * = 


(8.6.15) 


■’/•o 


-’/o- 


NiNz - 1 u 

_1 

N^iN^-DTj 


2 W - Mff 


1 




yVi(iV2- 1)0 
1 


2 i ~ A*/)* 


1VM ~T^ 2 ^^0*^ ~ 

(Ni - l)(iV2 - 1) i.i 

Note, however, that the equation relating Oy, of.^, afg. and of., is 

(8.6.16) 

JViiVj ' Ni ^2 " N1N2 ' 

but that (8.6.16) reduces to (8.6.4a) in the limit as Ni, -*■ 00 . 
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We shall be interested in the mean value and variance of x.. and the 
mean value of Sq., S.q, and 5... To determine these mean values we shall 
find it convenient to define the following quantities: 


(8.6.17) 

where it will be seen that 


t;.., = 

ifi 

^(0) = 2 KijKyj 
^(00) = 2 


(8.6.18) 


T(.., + 


^(■0) 


+ '^(•0) + T, 


( 00 ) 


= =0- 

Lt,; 


Now (T^u, 0 ^ 0 . and dj.. are linear functions of T’,..,, 7(0., and 7(00 

such that, when solved for the 7’s and making use of (8.6.18), we obtain 


(8.6.19) 


7(.., = (yVi- \){N^- 1)|4,+ 


+ 

O/O- T 


No 


T’co) = iN^ - 1)(N2 - 1) 
7’(0 ) = (^1 - 1)(JV2 - 1) 


N,-\ 

N, 


No 


-’/•o 


~ , ^/o- "h N^d^-o 

iV, - 1 

-<T?.. + NioJo- - ■ <^/o 

No — 1 


Tjoo) = (Ni ~ 1)(N2 ~ - A^2 Ct?.o] 


As in the case of infinite row and column populations, let {x^^; f = 
1,. . . , r; = 1,. . ., 5,) be the matrix sample, and let T.., Tq, Tq., and Too 
be defined from the x'^^, as given by (8.6.7), similar to the manner in which 
T{..) 7"(.q), 7"(q.) and ^(00) are defined from the x^-j. Note that the sample 
rardom variables T., T.^, and 7’^^ do not satisfy (8.6.18). 

It can be readily shown that 


^{T.) 


rs 

^2 


T,- 


(••) 


_ rs(s - 1 ) 

” NiN^iN^ - 1) 

ATo.) = ~ T,*.) 

iViNa(Ni-l) 

X rs(r - l)(s - 1) ^ 

®vtoo) = rrn r: ^(00) 

Ni1V2(JVi - 1)(N2 - 1) 


( 8 . 6 . 20 ) 
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which, as we shall see presently, are the basic formulas in the problem of 
finding the mean values of So-’ S - 
Now 5.0, Sq., and 5.. are the following linear functions of T.., T^, Tg., 
and Too, 

S o = — (T.. + To) - - (To. + Too) 

rs rs 

(8.6.21) So. = — (T. + To.) - - (To + Too) 
rs rs 

^ _ (r - 1X5 - 1) j _ (r- 1) _ (s - 1) ^1. 

rs rs rs rs 


Taking mean values, making use of (8.6.20), and substituting values of 
T(..), T(.o), and T(oo) from (8.6.19), we find the following mean values of 
the S’s 

(8.6.22) ASo.) - Ks - 1)[<^. + (; - ]^) ^f] 

^(S..)-(r-l)( 5 -!)<;?... 

The value of ^(Sf) can be determined from the equation 
^(Sr) - ^(S.o) + ^(Si.) + ^(S..). 

Note that these equations reduce to (8.6.11) as Ni, -*■ oo, if of.., cjo., 
0^.0 have <r!!, Oq., Og. as their limits respectively. 

To find the variance of S.. we merely have to determine the mean value 
of ( 2 .. — which can be expressed as follows: 

(8.6.23) (S.. - /I,)* = [T. + To + Tg. + Tg*]. 

rs 

Taking mean values of both sides of (8.6.23) and making use of (8.6.19) 
and ( 8 . 6 . 20 ) we find 


(8.6.24) 

<A2..) = 



The reader will note that this formula reduces to ( 8 . 6 . 12 ) as Ni, -*■ oo. 
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Formulation of a summary statement similar to 8.6.2 for matrix sampling 
in the case of finite populations of rows and columns is left to the reader. 

(c) Third’Order Matrix Samples 

The ideas in the preceding pages extend without major difficulties to 
the case of third-and higher order matrix samples. To indicate the varieties 
of variance components which arise in these higher order cases it should 
be sufficient perhaps to consider the third-order case briefly. Here we are 
concerned with a random variable x(u, v, w), where u, v, w are independent 
random variables with c.d.f.’s Fi(u), F^iv), F^iw). 

For x(u, V, w) we define the following means 


fi = S’{x(u, V, w)) 


(8 6 25) 'v) 

; similarly for fx.^., fx..„ 

Fuv = 1 »(«.w) dF^iw); 

similarly for /x.^„ 

and random variables 


«u.. 

similarly for e.,., s..„ 

(8.6.26) - /x.^. + /x; 

similarly for e„.„, 

= a<U, V, W) - - /x„.„ - 

Fv« + + Fv + F -w - F 


Then we have the following extension of 8.6.1: 

8.6.3 If u, V, w are independent random variables and if x(u, v, w) is a 
random variable, then 

x(u, r, w) s= + 

W ” 1 “ ^uvw 

where the e’s are random variables defined by (8.6.26) which have 
zero means and zero covariances. Furthermore 

(8.6.27) a\x(u, v, w)) = o\e^,.) + 

+ + (y^(eu.u;) + 

It will be convenient to let 

a^(x(u, v, w)) = 

<T*(0 = <^oo; = Oo-oJ 

= <^o; AcuJ = <4-; = o?..: 

AeuvJ = 0 ?- 



230 


I4ATHEMATICAL STATISTICS 


Then (8.6.27) may be written as 

(8.6.28) «T* = <T?oo + <0 + alo. + o?.o + < + og.. + 

Now let («!,..., u^), (vi ,..., i>,), (wi,..., w<) be independent samples 
from infinite populations having c.d.f.’s /i(«), Fiiv), and Fgiw). The c.d.f. 
of the sample is 

(8.6.29) fr W . fl W • n F,(w,). 

{=1 ll=l {=1 

Our third-order matrix sample is the set of random variables 

(a<«j, Wj); f = 1 ,. .., r; = 1 . s; 5 = 1 ,..., t) 

which may be regarded as a third-order matrix with r rows, s columns, and 
t layers. 

For this sample we define and the various sample means 

Xf.., x..(, Z(.f, x.^i by direct extension of the definitions (8.6.7). 

Each random variable x^^^ is a sum of random variables having the pro¬ 
perties stated in 8.6.3; we may write 

(8.6.30) Xf^f + Cf.. -I- e.,. -F e..; + + 6 ^, 5 . 

Furthermore, we define the sums of squares 

(8.6.31) 

Sr = 2 ”■ ^ 

S .00 = 2 (^^• - ^ •)*; similarly for Sq o and Sqo.; 

5.. 0 = 2 ^ r similarly for S.^. and Sq..; 

i.n.c 

5.. . = 2 (xj,{ - Xj,. - xj.{ - x .,5 -F xj.. -F X.,. -F X..J - x...)* 

I.n.l 

where it can be verified that 

(8.6.32) Sji = S.QQ + Sq-o + Sqq. + S ..0 + S.q, + Sq., + S.,.. 

It is customary to refer to S.qo as the row component of 5^ with similar 
interpretations for 5o o and Soo'J 1 S ..0 as the row-column interaction com¬ 
ponent of Sj, with similar statements for and Sqo-; and S... as the 
residual component of Sj- 
Now it is evident that 


<^(X...) = fi. 
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But to determine the mean values of the S’s and the variance of wre 
find that the mean value in each case is a linear function of quantities like 
the following: 

(8.6.33) 




f = r. 

=»?'. 

11 

2 1 2 
= (T^o + (K) 

1 2 
' ^00» 

f = f, 

»7 = V'> 


= <T?0- + *^^00 

+ 

f = f. 

V >/', 


2 1 2 
= (To-. + %o 

+ 


»/ = »/'■ 

II 



^ = r. 

r; -■/: »y'. 




/ r, 

n = n'. 


2 

— *^00’ 




tj\ 

II 

= 0; 


f 7^ f 

n 7^ 



where is given by (8.6.28). 

Evaluating mean values of the S’s, we find 


(8.6.34) 


ASoo ) = rs{t - 1) + - ffo-. + - + — o“- 

L s r rs J 

^(So o) = I'Ks - 1) o'o o + " ^0- + ■ <^"0 + - 

t r rt J 

<^(S.oo) = ~~ 0 <^^00 4 — ^* 0 - + ~ ^0 4 — 

L t s St J 

^(S,,..) = r(s - l)(t - ^ (jI. 

S\S.o.) = s(r - l)(t - + i at 

<f(S..o) = t(r - l)(s - l)|^cr®.o + ^ 

^(S,..) = (r - l)(s - l)(t - 


and, of course, the value of S{Sfp) is the sum of the quantities on the right 
of (8.6.34). 

The variance of is similarly a linear function of quantities in (8.6.33), 
and has the value 


(8.6.35) 


(a;...) =s= - (T?oo + ” ^0 0 + ” ^oo- H-H- 
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We leave to the reader the task of formulating the extension of 8.6.2 to 
the case of third-order matrix samples for the case of infinite populations 
of rows, columns, and layers. 

If the reader will compare the structure of the equations in (8.6.11) with 
those in (8.6.22) he should have no difficulty in surmising the form which 
(8.6.34) will take if, for the matrix sample {a:(Mf, v^, w^}, (wi,...,«,), 
(uj, .. ., V,), and (wj,..., are independent samples from finite row, 
column, and layer populations, respectively. Similarly, by comparing 
(8.6.12) with (8.6.24), he will infer the form which formula (8.6.35) for 
<r*(S...) will take in the finite-population case. We leave it to the reader 
as an exercise to write down and verify these formulas. 

(d) Balanced Incomplete Matrix Samples 

Let us return to the second-order matrix sample (a:^,; f = 1,..., r; 
= 1,..., i), which we denote by and select a subsample {aif,}* 
consisting of s' elements from every row and r' elements from every 
column of If n is the sample size, then n, r, s, r', s' must be positive 
integers satisfying the conditions 

(8.6.36) n=^rs' ^ r's. 

Let the set of pairs of values of (f, rj) for this selection be denoted by G*. 
We shall call such a sample a balanced incomplete matrix sample. 
Examples in which such selections G* can be made are: n = 12, r = 3, 
j = 6,r' = 2, s= 4; « = 16,r = 4,5 = 8, r' = 2,s' = 4; « = 40,r = 5, 
s — 10, «' = 4, j' = 8. The reader interested in the details of the con¬ 
struction of specific patterns of selection of such samples is referred to 
papers by Bose (1939), Connor (1952), Shrikhande (1952), and to books 
by Cochran and Cox (1957), Kempthorne (1952). 

Let the mean of the incomplete sample, the mean of the |th row and the 
ijth column be denoted by x*, «.* respectively, and let 

St = 

i,n 

(8.6.37) ^ ^ “ 2* (** - *•*•)“ 

S * c*** c* c* 

where denotes summation for ail (f, rj) e G*. Sg is the residual sum 

(.V ■ 

of squares. 



Sec. 8.6 


SAMPLING THEORY 


233 


We shall consider the variance of and the mean values of the 
in the case of infinite populations of rows and columns. In this case, all 
these mean values are linear functions of quantities like those in (8.6.10). 
The evaluation of and <^(5*.) is straightforward 

whereas 

(8.6.38) <r(S^) = S(S%) - ^(S*) - ASo*). 

Letting r*s = rs' = «, and omitting the details, we find 

(8.6.39) 

(^(x*) = fi 

c^(x*) = - a?. + - flr?Q + - <To- 
n r s 

= (r - \)al + (r - + (r ~ r')al 

^(S*) = (s - l)a^. + (s - sVo + (5 - 
^(Se) == (n - r - s + !)<;?. - (s - s')a% - (r - r')(Tg... 

Note that if r' = r, and s' = s, these equations reduce to (8.6.12) and 

( 8 . 6 . 11 ). 

Balanced incomplete matrix samples in their most general form from 
third- and higher order matrix samples must satisfy several varieties of 
conditions, and are consequently rather complicated. Here we shall only 
discuss a special third-order case, which, however, is basic in the theory 
of experimental designs. 

In this case we consider a third-order matrix sample with r rows, r 
columns, and r layers, that is, f, i;, C = 1> • • • > ''• Our balanced 

incomplete matrix sample will be a selection from J consisting 

of one element from each combination of values of f, rj, one element from 
each combination of values of and one element from each combina¬ 
tion of values of rj, 5. 

The number of elements in the selection is r^, thus comprising 

only a skeleton selection from However, the balance incorporated 

into {a:^,, J* is quite remarkable. In fact, if the elements in are all 

projected onto the f7/-“plane” (the row-column plane) it will be noted 
that in every row of the projected form of one finds one and only 

one element from each of the r layers of J. Similarly, in every column 
of the f//-“plane” one finds one and only one element from each of the 
r layers of Furthermore, similar statements hold if is 

projected onto the |{-“plane” or the //^-“plane,” by suitable interchange 
of the words “row,” “column,” and “layer.” Each of these projected 
forms of may be called a Latin-Square selection of rows, cblumns, 
and layers. 
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Now let G* be the set of values of (f, yj, 0 used in the particular 
selection of the incomplete sample. Needless to say, there are many choices 
of elements for G*. The reader interested in the construction of sets G* 
(the construction of Latin Squares) is referred to Bose (1938) and Mann 
(1943). 

We shall be interested only in the sample mean ic*, and the means 
2*., and of the elements in the fth row, rjth column, and ^th layer 
respectively, of the incomplete sample, and in 

5*00= r 

(8.6.40) 5*0 = I* K - 

5 ^. = 2 * {A - f 

S >|t ___ 

— 0.00 '^oo ““ *^00* 
where denotes summation over all (|, t], 0 ^ G*. 

Now the variance of x* and the mean values of the S*"s in (8.6.40) are 
all linear functions of only the first four types of quantities listed in 
(8.6.33). Carrying out the mean value operations, omitting the details, 
we find: 

^(S-w) = (r - l)a% + (r - l)ff*oo 
^(So*o) = (r - 1)4 + (r - l)alo 
(8.6.41) ^( 5 ^ ) = (r - 1)4 + (»• - 1)4- 

= (r - l)(r - 2)4 + (.r- 1)®[<t^oo + + 4-]. 

where a% satisfies 

(8.6.42) = a% + c^, + <o + 

It should be noted from (8.6.27) and (8.6.28) that a% is the sum of four 
components of cr^ namely, 

(8.6.43) 4 = cr^ + a^o + 4- + 4- 

8.7 RAMPLING THEORY OF ORDER STATISTICS 

Suppose (ajj,..., a;„) is a sample from a population having a continuous 
c.d.f. F{x), Let a?!,..., be rearranged in order from least to greatest 
and let the ordered values be ..., a:(„), where ar^^) < • • • < x^^y These 
new random variables are called the order statistics of the sample \ x^j^^ is 
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called the A:th order statistic. Note that since F{x) is continuous, 
“ ^( 1 )) = 0, f == 1,..., /I. The intervals (- oo, (rr^,, x^ 2 )\ 
• • •» (^(n)> + oo) are called sample blocks ..., respectively, and 
the functions F(:C(i)), — ^(^(d), ..., 1 — F{^{n)) of these blocks arc 

called coverages Mj, .. ., respectively. Since the sum of the m’s is 1, 
we shall ordinarily omit The subscript on the J?’s denotes dimen¬ 
sionality. Sample blocks of two or more dimensions will be defined in 
Section 8.7(c). Note that a coverage for a given sample block is merely 
the amount of probability in the population distribution (P-measure) 
contained in that sample block, and is a random variable. 

The sampling theory of order statistics and of coverages is of funda¬ 
mental importance in nonparametric statistical inference^ which is discussed 
in Chapters 11 and 14. In this section we shall consider some of the 
more basic results in the sampling theory of order statistics. The reader 
interested in more details in this field should consult the books by Fraser 
(1957), Gumbel (1958), and Kendall (1953). An extensive bibliography 
of publications in this field has been published by Savage (1962). 

(a) Sampling Distributions of Order Statistics 

As stated in 7.1.1, if a random variable x has a continuous c.d.f. F{x\ 
the random variable y, where y = F(x), has the rectangular distribution 
1). Now let y^ = F{x^), f = 1,..., n. Then (yj,..., yj is a sample 
of size n from the rectangular distribution /?(J, 1). 

The p.e. of these ys is 

(8.7.1) l-dy^---dy, 

over the unit “cube” {(i/i,..., yj: 0 < < 1, f = 1,...,«} in 

the sample space and 0- dy^ - • • dy^ outside the cube. Let < • • • 
< y^n) be the order statistics of the sample (yj ,,., ,yj and consider the 
problem of determining the p.e. of these order statistics. The probability 
that two or more of the elements of the sample (j/i ,..., j/J are equal is 
zero. So we may consider only points in the unit cube whose coordinates 
are all distinct. Suppose (yj,..., y J is such a point F. The p.e. associated 
with point F is 1 • • • • dy^. But a total of n ! points in the unit cube 

are obtained by permuting the coordinates of P, If new random variables 
are formed by letting be the smallest coordinate of these n\ points, 
2 /( 2 ) the next smallest, and so on, then the p.d.f. of the random variable 
(y(i)» • • • > 2/(n))» whose components are the order statistics of the sample 
• • •» yj* will be given by adding he p.d.f.’s of these nl points and 
using the order statistic notation for ;hc random variables. This gives as 
the p.e. of (y,i„ ..., 

(8.7.2) n'^dy^ly”■dy^„^ 
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inside the region {(y,!,,.... 0 < < • • • < y,,, < 1}, and 

0 * ^tfa) ’' ’ <^{n) outside. This is the ordered n-variate Dirichlet distribu¬ 
tion Z)*(l,.... 1; 1) defined in (7.7.17). Therefore we have the following 
result: 


8.7.1 If (*( 1 ),..., *(„,) are the order statistics of a sample from a con¬ 
tinuous c.d.f. F(x), the random variables F(X(i)),..., F(x^„)) have 
the ordered n-variate Dirichlet distribution D*(\, ..,, 1; 1). 

The distribution of any subset of one or more of the random variables 

F(x^i’^ . F(x^„y) follows at once by application of 7.7.6. Thus, for the 

case of one of these random variables we have 

8.7.2 The random variable F(x^^^), I < k < n, has the beta distribution 
Be(k, n — k.-\- 1). 

For the case of s of these random variables we can state the following 
result: 

8.7.3 The random variables F(X(^f), F(x^^^^^^^),..., have 

the ordered s-variate Dirichlet distribution D*(ki, ...,k,;n — ki — 
•' - — k, + i). 

It should be noted that in 8.7.1, 8.7.2, and 8.7.3 we assumed only that 
F(x) is continuous. Now if we make the somewhat stronger assumption 
that a p.d.f. f(x) exists, we can determine p.e.’s of the order statistics 
*( 1 ), ..., and subsets of them. Thus, corresponding to 8.7.1, 8.7.2, 
and 8.7.3, we have the following: 

8.7.1a If (X(i), • • •. *(«)) ore the order statistics of a sample of size n from 
a population having c.d.f. F(x) and p.d.f. f(x), then the p.e. of the 
order statistics is 

(8.7.3) n\f(x^l^) • ’ •/(*(„)) dx^l^ • * * dx^„^, 

the sample space of (*(!,,..., «(„,) being the region in R„ for 
which — 00 < < • • • < ar,„, < + oo. 

8.7.2a The p.e. of Xi^^, I < k < n, under the conditions stated in 8.7.1a is 

;; f(*(»))]"-*/(*(»)) dx^^„ 

r(k)r(n — k + l) 


the sample space of X(„ being Ri. 
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8.7.3a The p.e. of 

stated in 8.7.18 is 


, a!(j^+... under the conditions 


(8.7.5) 


r(« + 1) 


r(fej)' 


r(fc,)r(n - fci-fc. + 1) 

• [1 - f(*(*.+-.+».))r*‘—-*'/(*(».)) 


•/(*. 


(fci + * 


r k,)) 


(fci) 


da?, 


(fcl 4- * 


iki). 
+ fcf)> 


the sample space of (a:,^^,, •• •.*(*,+ • •+*.)) 

ref/oB in R, for which — oo < < • • • < ...+*,) < +oo. 

Formulas (8.7.3), (8.7.4), and (8.7.5) were originally obtained by Craig 
(1932), and they have many interesting special cases. For instance, if in 

(8.7.4) we put Jk = 1 we obtain the p.e. of the smallest element in the sample, 
and for A: = M we find the p.e. of the largest element in the sample. If » is 
odd and k = (n + l)/2, we obtain the p.e. of the sample median. If, in 

(8.7.5) s = 1, ki — \, k^ — n — \ we obtain the p.e. of the largest and 
smallest elements in the sample; 

(8.7.6) n(n — l)[F(a:(„)) — F(a;(i))]" */(®(i))/(*(n)) ^*(n)- 

The distribution of the sample range w can be obtained by appl 3 dng the 
transformation 

a;,!, = V 
*(n) = » + w 

to (8.7.6) and taking the marginal distribution of w, that is, integrating 
out the variable v. 


(b) Samplin g Distributions of One-Dimensional Coverages 
The distributions occurring in 8.7.1, 8.7.2, and 8.7.3 have a wider 
interpretation than may be immediately apparent from the preceding 
statements. Suppose we make the transformation 


Va) = wi 


y(s) = «i + «2 


(8.7.7) 


»(n) = «i + • • • + «„ 

The p.e. of the random variables Ui,...,u„ is 

( 8 . 7 . 8 ) n\d>ix’"du^ 

in the simplex S',: |(ui,..., «„): > 0, f => 1.n, 2 “# < 

0 • dk/j • • • </«„ outside 5„. This is the p.e. of the «-variate Dirichlet 
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distribution Z)(l,..., 1; 1) defined in (7.7,1). But note that the random 
variables Wj,..., are the coverages defined in the first paragraph of 
Section 8.7. Furthermore, the distribution of the coverages does not 
depend on the population c.d.f. F{x\ which is assumed to be continuous. 
For this reason, the coverages behave in a distribution-free manner. 

We may summarize as follows; 

8.7.4 Let (x^), . . ., be the order statistics of a sample from a con¬ 

tinuous c.d.f. F{x). Then the coverages u^ = F(X(^))^ Wg = ^(^( 2 )) 
— F(x(i)),. .., = F(X(^)) are random variables hav¬ 

ing the n-variate Dirichlet distribution /)(1,.. . , I; 1). 

Note that the distribution of the coverages given in 8.7.4 is completely 
symmetrical in the variables. This fact leads us to refer sometimes to the 
sample blocks . . ., corresponding to the coverages Wj,... 
where = 1 — — • • • — respectively, as statistically equivalent 

blocks. It follows from this fact and IHJL that 

8.7.5 Any k (k < it) of the coverages listed in 8.7.4 have the k-variate 

Dirichlet distribution 1; /z — /: + 1). 

Applying 7.7.4 we also have 

8.7.6 The sum of any k of the coverages listed in 8.7.4 has the beta 
distribution Be{k, n — k + 1). 

It will be observed that 8.7.2 is a special case of 8.7.6 since is 

the sum of the first k coverages Mi, ..., Uf^. 

Finally, we point out that application of 7.7.5 leads to the following 
result: 


8.7.7 If Vi^.... Vg are sums of ki,^ k^ respectively, of the coverages 
listed in 8.7.4, where no coverage belongs to more that one i\ then 
the distribution of . . . ,v^ is the s-variate Dirichlet distribution 
D[k^,... ,k,\n - k^ -+ 1 ). 

(c) Sampling Distributions of Multidimensional Coverages 

Thus far we have only considered order statistics in samples from a 
one-dimensional distribution. The results which have been stated in 
8.7.4, 8.7.5, 8.7.6, and 8.7.7 hold for an extension of the definition of 
sample blocks and coverages to two or more dimensions. Let us consider 
the case of two dimensions. We assume that | = 1,..., w) is 

a sample from a continuous two-dimensional c.d.f. F{x^, x^. We intro¬ 
duce an ordering function h{x^, x^ such that w = h{x^, x^ is a random 
variable which has a continuous c.d.f. //(h’). Then the random variables 
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Wj = h{xi^, x^f), f = 1, ..., n constitute a sample from a population 
whose distribution is that of hix^, x^ and this sample can be ordered. Let 
the order statistics for the sample (wj,..., w„) be (w,!,,..., w,„,). The 
coverages uj, i4.w^+i, where = 1 - - u'„ and 

are random variables associated, respectively, with the two-dimensional 
sample blocks ..., into which the aji^j-plane is decomposed 


*2 



by the ordering “curves” = h{x^, x^, f — 1,..., « as illustrated in 
Fig. 8.1. 

It is evident from the foregoing discussion how one would define 
coverages u[, ,,., ««+! for fc-dimensional sample blocks B^\ ..., 
produced by a k-variate ordering function hix^, ...,x^ having a continuous 

c.d.f. i/(w). If Wj = hix^(,x^f), f = 1,.... n and if (w^,. w^„{) 

are the order statistics corresponding to (w^,..., tvj then since 
(W(i,,..., W(„,) are order statistics from a population whose distribution 
is that of h(xi, ..., arj, we have the following result: 

8.7.8 The coverages for the k-dimensional blocks defined by 

the ordering function h(xi ,..., *„), have the same distribution 
properties as those possessed by the coverages Ui,... ,u„ and stated 

in 8.7.4, 8.7.5, 8.7.6, and 8.7.7. 
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The notion of coverages for two or more dimensions is more general 
than it may appear at first. Wald (1943) gave a method of constructing 
coverages based on rectangular sample blocks, which was subsequently 
generalized by Tukey (1947). 

We shall consider Tukey’s method for the case of two dimensions. 
Instead of using only one ordering function h{x^, ^ 2 ) for the n sample 

points (x-if, X 2 (), I = 1.n we can use as many as n ordering functions, 

h^(xi, x^, rj = I,... ,n. It is assumed that each of the functions h^ix^, 



Flo. 8.2 Example of two-dimensional sample blocks generated by horizontal and 
vertical lines, taken alternately, as graphs of ordering functions. 

has a continuous c.d.f., the c.d.f. of the population distribution being a 
continuous c.d.f. Fix^, Zj). Some or all of the functions can be identical, 
of course, if we let = h^ix^f, x^f), then for each value of r) we would 
have a set of order statistics We now cut up the 

Z 2 Z 2 -plane as follows: The curve w'd = hiix^, Zj) cuts the z^Zg-plane into 
two point sets and where is the set for which hi(xi, z^) < 

We define our first coverage, say wf, as the P-measure (the probability 
determined from F(zi, Zj)) of the sample block The curve 
“ * 2 ) cuts the set S!p into two sets, B!p and such that on Bf\ 

h^{xi, Zj) < ^^ 2 )• coverage be the F-measure of B^\ Continuing 
this process, we successively define sets B^\ B ^\..., such that 
^*) c S ^>,.,., jB^"> c S^-^\ There will, of course, be a residual set 
.^"+« such that u • • • u 4"> U » B,. The F-measures of 
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..., are our general coverages wf,..., u*. Figure 8.2 shows an 
example for « = 10 where the ordering functions Xg), = 1,..., 10 
are straight lines making angles of (rj — 1)90° with the x^-axis. 

The definition of fc-variate analogues of the coverages wf,..., is 
straightforward and is left to the reader. The following statement 
concerning these coverages holds: 

8.7.9 The coverages w*,..., w* for the k-dimensional blocks (as defined 
above) have the same distribution properties as those possessed by 
the coverages ... , in 8.7.4, 8.7.5, 8.7.6, and 8.7.7. 

To establish 8.7.9 it will be sufficient to show that wf,..., w* have the 
w-variate Dirichlet distribution Z)(l,..., 1; 1) for ^ = 2. If F(xj^, Xg) is 
the c.d.f. of the population from which the sample (x^^, ajgj; I = 1,..., n) 
is drawn, the p.e. of this sample, which we denote by 0„, is 

(8.7.9) n * 2 ^). 

Let (x^iV, x.^V) be the sample element among the n sample elements which 
yields the smallest value of AiCx^, Xg) and let (x^^y^, f ^ = 1,..., n — 1), 
to be denoted by be the n — 1 sample elements obtained by deleting 
(x^V, xy*>) from On- The p.e. of (xy*>, xi^J) and (x^^y^, 4yi 5 fi = — 1) 

is 

(8.7.10) n ■ dF(xill x,^V)U dF(a:”>, x<^\). 

^ 1-1 

Now let ul be the coverage associated with the set in the x^Xg-plane 
for which hi(x^, Xg) < hi{x[\\ x^V). It follows from 8.7.2 that the distri¬ 
bution of ul is given by the beta distribution Be(l, n), that is, the p.e. of 
ul is 

(8.7.11) n(l - uX'^dul 
where 

(8.7.12) u'i = f dFix^, *2), du'i = dFix^i] *<!>). 

Now consider the (2/i — 2)-dimensional conditional random variable 
(*iV,> I *ii^ = !.•■•,«)• The p.e. of this distribution is given 

by the ratio of (8.7.10) to (8.7.11), which reduces to 


(8.7.13) 
where 

(8.7.14) 


UdF<^\x\)\, 



F^^\x 


( 1 ) 

Ul’ 



1 - ui' 
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This means that for given values of (a^V, 4*’), and hence for a given value 
of Ux > the remaining n — 1 elements of the original sample behave exactly 
like a sample of « — 1 elements from a population having the c.d.f. 

(8.7.15) = 

1 — «i 

which is obtained by assigning zero probability to and normalizing 
back to unity the portion of the original population distribution contained 
in E^\ 

Now we repeat the process and let a4»>) be the sample element 
among the « — 1 elements of which yields the smallest value of 
AjCxi, * 2 ) and let 4 V,» f 2 = 1 ,— 2 ), to be denoted by Ojfig. b® 
the « — 2 elements of obtained by deleting {xfi, 4^>). The p.e. of 
( 4 ^), 42 )) and ( 4 {’,, 4 {*,; fa = 1 , 2 ) is 

(8.7.16) (n - 1) dF<i>(4?> 4?>) n dF<«(4f,. 4^>). 

{« = ! 

Let be the subset of for which h^ix^, x^ < h^ixfj, x^?) and let 

4 = I dF^^^(x^, Xj) be the conditional coverage associated with as 
Jsp' 

determined from F*''(xi, Xj). It follows as a special case of 8.7.2 that the 
distribution of 4 has p.e. 

(8.7.17) (n - 1)(1 - M^)”-® du'i 

and the p.e. of (x^^^^, ] xfi, x^?; = 1 ,..., « - 2 ) is 


(8.7.18) IJdFW(x«l.4|’) 
where 

(8.7.18») f >!?., 4‘.) = . 

1 — 1^2 

Thus, for given values of (xJV, 4**) the elements of 

behave like a sample from a population having c.d.f. 

(8.7.19) F<^>(xx. X 2 ) = 

1 — M 2 

The distribution of «i, 4 tind the elements of the product of 

(8.7.11), (8.7.17), and (8.7.18). 

Continuing this process we obtain a sequence of conditional coverages 
4> • • • > 4 having a distribution with p.e. 


n!(l -4)" 




(8.7.20) 
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But expressed in terms of the coverages «*, f = 1,, 
in 8.7.9 we have 


, n referred to 


_ ^2 


Uo — 




(8.7.21) 


Applying this transformation to (8.7.20) we obtain 
(8.7.22) n \ du* • • • du*. 

But this is the p.e. of the «-variate Dirichlet distribution Z)(l,..., 1; 1), 
which concludes the argument for 8.7.9. 

Further results in the sampling theory of coverages have been obtained 
by Fraser (1951,1953), by Fraser and Guttman (1956), and by Kemperman 
(1956). In particular, they have shown how to obtain statistically equi¬ 
valent blocks by using ordering functions which can depend on the results 
obtained by ordering functions used earlier in the process of determining 
sample blocks. Extensions of the notion of sample blocks and coverages 
to the case of discontinuous c.d.f.’s have been considered by Tukey (1948). 


8.8 ORDER STATISTICS IN SAMPLES FROM 
FINITE POPULATIONS 


Suppose TT^ is a finite population whose elements have distinct x-values, 
^ ^oN- • • •»^(n)) 1^^ order statistics of a 

sample of size n from We shall consider the sampling theory of the 
A:th order statistic It is evident by combinatorial analysis that 


( 8 . 8 . 1 ) = xj = 


C) 


= P,v,«.fc(0. say. 


where t = k,k + i,..., N — n + k. Thus, py „,k(0 can be viewed 
either as: (i) the p.f. of the Random variable its mass points being 
t ^ k,k + 1,..., A — « -f or (ii) the p.f. of the random variable /, 
that is, the rank of the x-value in to which the kih order statistic in the 

sample is equal. The random variable t is simpler to handle than the 
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random variable since the mass points of t are located at the successive 
integers k,k + I,..., N — n + k, whereas those of are located at 

®o,»> ^o,k+V • • • > ^o.K-n+k' 

Moments of t of the factorial variety are quite readily found. In fact, 
using the notation of (3.3.IS) we have 


(8.8.2) ^((t + r-l)W) 


N-n+k 

= I 0 + r - l)WpAr.«.»(0 

t^k 




C) 


2 

t^k 




But the sum on the extreme right has the value 1. Therefore, 


(8.8.3) 


m + r - 1)W) = 


(fc + r — 




Putting r = 1 and 2, we find the mean and variance of t to be 


m = 


(8.8.4) 


a *(0 = 


Summarizing, 


fe(N + 1) 

(« + 1) 

k(N + 1)(N - n)(y» - fc + 1) 
(n + l)*(n + 2) 


8.8.1 If tiff is a finite population whose elements have distinct x-values 
x^i<’' • < Xgff and if (x^l ^,.... x^„^) are the order statistics of 
a sample from the p.f. of x^^^ is given by (8.8.1), the mass points 
being x^f, t = k,k +\,.,.,N — n-\- k. The value of S{{t + r — ly*'') 
is given by (8.8.3). 


If we consider the infinite sequence of random variables 


(8.8.5) 



N = fc, fe + 1,... 


then lim u^ if is the proportion of the x-values of an infinite population 

N^<o ’ 

which X(,^, exceeds. It is intuitively evident that lim n corresponds to 

tf—oo ’ 

F(X(jt)) for the case of sampling from an infinite population having c.d.f. 
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F(x) and should, in accordance with 8.7.2, have the beta distribution 
Be(k, n-k+l). 

To see that this is actually true we proceed as follows: 

(8.8.6) lim «^(-T= lim = EL ”-+i)r(^ .. + 

jv^oo \n/ iv-co \ / r(/c)r(« + r + 1) 

which is the rth moment of a random variable having the beta distribution 
Be{ky n — k + \). It now follows from 5.5.3 that the sequence of random 
variables (8.8.5) converges in distribution to this beta distribution. 
Therefore 

8.8.2 Let TT^, N = n,n + I,... be a sequence of finite populations such 
that the elements in have distinct x-values • • • < 

Let , x^^^) be the order statistics of a sample of size n from 

TT^, and let Uj^ ^ denote the proportion of elements in tt^ having 
x-values exceeded by x^^y The sequence of random variables 
N ^ n,n + I,converges in distribution to a random 
variable having the beta distribution Be(k, n — k + If 


PROBLEMS 


8.1 If . ,Xn)is3L random sample from a population whose distribution 
has finite moments /^a> • * •» f^zr about the distribution mean, and if 

1 ^ 

- - 2 

show that 

<^mr) - ^ [i“«f - - r(r - l)ft 2 t*r-if*r + “ 2 /*r+l)] + 

8.2 If ^a are the sample means and 5 *, j| are the sample variances of two 

independent random samples of sizes Wj and na froi*' and iVOwa, a*), 

resp^tively, show that 

_ (^1 ^ a ) "" (^1 ~ _ 


/l+i 

■(/ll - l)sl + (Wg - l)j|l* 

V «2 

L tti+ttt-2 j 


has the “Student” t distribution S(ni +712 — 2). 


8.3 If ..., and (a?ai, •. •, ^zn) are independent samples from 
<^l) and N(/i 29 <^i), respectively, and if Xj, Xg are the sample means, jf, arc 

1 ** 

the sample variances, and r = - *" ^a) ^be correlation 

coefficient between the two samples, show that 


Pi - ^ 2 ) - 0^1 - 
[5f + 5| — 2r5i5a]* 


has the “Student” distribution S(n — 1). 
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8.4 Suppose and £2 means of independent samples of sizes rii and 112 
from distributions whose variances are both equal to Let x be the mean of the 
two samples pooled together as a sample of size + 712 . Show that the variance 
ofX’-Xi is o^nillniirti + « 2 )]* 

8.5 If a? is a random variable having the distribution N(pi, a*) and if F(x) is 
the c.d.f. of this normal distribution show that the correlation coefficient between 
the random variables x and F(x) is VS/tt. 

8.6 If (a?, y) is a pair of random variables having an arbitrary two-dimensional 
normal distribution with correlation coefficient p and if F(y) is the c.d.f. of ?/, 
show that the correlation coefficient between the random variables x and F(y) is 

pVS/TT. 

8.7 Making use of 8.2.2 for r = Xg) s construct Q[ 2 ] (j^i,..., 

for a sample of size n from a distribution having finite first and second moments 

and /i 2 * obtaining an unbiased estimator for Find the variance of 
this estimator. 

8.8 If (a?!,..., a:„) is a simple random sample from a finite population ny of 
size N having variance show for any (real) constants Cj,..., that the 
variance of CiX^ -f * * * + Cn^n is 

+ • • • + c*) - i (Ci + • • • + c„)*J a\ 

Hence, show that the variance of the difference between the means of two random 
samples of sizes ni and /I 2 , /ii + /I 2 < drawn without replacement from is 

flr2(l//ii -h I//I 2 ). 

8.9 Suppose samples of sizes /i^,..., are independent samples from 

identical normal distributions N(y, Let ^ 1 ,..., ^ the means and 

•yf* • • • > -y* the variances of these samples. Let 

« “ ^ - l)jf + • • • + («» - l)jJ] 

» = ^ -*)* + •••+ «*(*» - «)*] 

H- = ^ [«(» - Al)*] 

where x is the mean of all samples pooled together as a single sample, and 
» * Wx -1- • • • -f Show by Cochran’s theorem or by characteristic functions 
that II, V, and w are independent random variables having chi-square distributions 
C(#fx H-+ — A:), C(k — 1), and C(l), respectively. 

8.10 Suppose ( 2 / 1 ,..., 2/n) independent random variables from 


N(/Ji + Pxi, <y*),..., NOi + Pxn, cF*) respectively. 
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where j3, ..., are constants. Let (i and /f be the values of and /? which 

minimize the sum of squares 

-/< - 

and let 

” = -fi- PxiY- 


Show that (//, p) has the two-dimensional normal distribution N({fx, P}; lk«ll). 
where 



xa^ 

nan 

an 

xa^ 


— - 

— 

On 

an 


anda„ = - xY, 


{ 

and that v has the chi-square distribution C{n — 2). Furthermore, show that 
(/t, p) and V are independent. 


8.11 If (a'l,.. ., .rj is a sample from NQi, o^) show that the characteristic 
function of 


is 





— exp 



where P^ = nd^/a^, and hence that the p.d.f. of u is 




where 2 r(«) is the p.d.f. of the chi-square distribution C(/? + 2r). (The distri¬ 
bution of w, originally obtained by Fisher (1928/?), is known as the w(7/ice/irra/ 
chi-square distribution with // degrees of freedom and parameter p'^.) 

8.12 If /;, X, 5^ are the size, mean, and variance of a sample from A^(/<, <j 2), 
show that the distribution of 

/ = - /< + b)Vn 

s 


has p.d.f. 

where 


/(0 = 


Vn(n - 1) r(J« - i)\ 



g(t) 


« / Vlyt V/, ^ r[J(/i + r)] 

r'ToWw-lM « - 1/ r! r(-|«) 


and y = ‘Yndja. (This distribution obtained by Tang (1938), is known as the 
noncentral Student distribution with n — 1 degrees of freedom and parameter y.) 
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8.13 Find the finite-population forms of formulas (8.6.34) and (8.6.35) if the 
numbers of elements in the populations of rows, columns, and layers are Ni, A' 2 , 
and A^ 3 , respectively. 

8.14 If (a^i,..., a? J is a sample from the rectangular distribution 0) 
show that the sampling distribution of « = max (xj,..., Xn) has p.d.f. 

on (0, 0) and 0 outside. 

8.15 If a sample of size {In H- 1) is taken from the rectangular distribution 
i?(J, 1) show that the median of this sample has the beta distribution 

Be{n -h 1, /I + 1). 


8.16 If (j^i,. .., is a sample from the rectangular distribution R(jiy co) 
show that the p.d.f. of the sample range r is 


n(n - 1) 

O) 



for r e (0, co) and 0 for r ^ (0, w). 


8.17 Suppose (a?(i),.. ., are the order statistics of a sample of size n 
from a population having a continuous c.d.f. F{x). Show that the covariance 
matrix of (^(ic^;^^,), where 1 < < /cg < w is 


where 


Fid -Fi) 

Fid -Fs) 

(« +2) 

(n +2) 

Fid -Pi) 

Fad — Fa) 

(« +2) 

(n +2) 


n _ „ _ ^2 

« + 1 ’ w + 1 ■ 


8.18 If , X(„|) are the order statistics of a sample from a population 

having a continuous c.d.f. F(x), show that the mean value of the range R = 
*(«> - *<ir is given by 

S{R) = I {1 - F"(x) - [1 - F(x)f} dx 

J— 00 

a result due to Tippett (1925). 

8.19 {Continuation) Show that the c.d.f. of R is given by 

n I {F{x + /?) - F(a;)}”-i dF{x). 

J — ao 

8.20 In a sample of size n from the rectangular distribution /?0 m, a>), let 

and be the smallest and largest order statistics of the sample. Let 
^ + ^(d) ~ ~ ^(d) show that the p.e. of r is given by 

g{r) dr ={n - 1)(1 + 21rl)-” dr 

on the interval (— 00 , + qo). [Carlton (1946)]. 



SAMPLING THEORY 


249 


8.21 If are the smallest and largest order statistics of a sample of 

size n from a contin uous c.d.f. show that the random variable 

InV — F{x^n))) has a limit distribution as « oo, with mean and 
variance 4 — [Elfving (1947).] 

8.22 Let (a^d),. . ., be the order statistics of a sample from the p.d.f./(a?) 
where 

1 e~{x X > ft 

cs 

0, X fX 

and let 

1 

= -- 7 2 - •^( 1 ))- 

n — I ^^2 

Show that the p.d.f. g{() of the random variable / defined by 

t = (^(1) - fi)lw 

is given by 

^ lo, t < 0, [Guttman (I960).] 

8.23 If {xi ,..., ajJ is an n-dimensional random variable having a p.d.f. 
f(xj ^,..., a:„) which is symmetric in a-^,..., .r^, show that the order statistics 
(a^’d),...» ^(n)) of (a^i,..., Xn) have p.d.f. n\ f{x ^^,,..., in the region for 
which — 00 < a^d) < • • • < x^J^^ < -h oo, and 0 otherwise. 

8.24 If (a?!,. .., x^) is a sample from the rectangular distribution i?(J, 1) let 

y = (xi • • • 

Show that the p.d.f. of y is 

j^nyTl—l 

—rr; (— log yT~^ on (0, 1) and 0 otherwise. 
in - 1)! 

8.25 If m independent samples of size n are drawn from the rectangular 

in 

distribution Ri^, 1) and if u = n where ..., are the largest order statis- 

i 1 

tics in the m samples respectively, show that the p.d.f. of w is 

72 

«"-!(- log «)”•-! 

on (0,1) and 0 otherwise, a result due to Rider (1955). 

8.26 If (a^i,.. ., a;„) is a sample from a gamma distribution G(/0, and if 
/i(a:i,. . ., a;J is a random variable such that 

/7(a’j, . . . , a:„) = hicx^, . . . , cx^ 

for each c > 0, show that (a-^ + • • * + a-^) and hix ^,..., a:J are independent. 
[Pitman (1937a)]. 

8.27 If (ajjL,..., a;J is a sample from a p.d.f. fix) and if, for any set of 
constants Ci,..., c^, not all zero, the ratio {Cix^ + • • • + CnXn)li^i 4 - • • • + a:„) 
and the sum (a^^ -f • • • -f x^) are independent, show that/(a;) is the p.d.f. of a 
gamma distribution. [Laha (1954)]. 
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8.28 If («(i),..., Xin)) are the order statistics of a sample of size n from a 
population having a continuous c.d.f. with finite mean fi and variance show 
by use of Schwarz’ inequality, that 


^(^(n)) < /“ + 


(n - 1)<T 


a slightly different form of a result due to Hartley and David (1954). 

8.29 If (a:,!),..., xt^)) are order statistics in a sample of size n from a 
continuous c.d.f. F{x\ show that the random variables 



are independently distributed according to the rectangular distribution 1) 
[Malmquist (1950) and Renyi (1953)]. 

8.30 Suppose (a^i ,,,. is a sample from the rectangular distribution 
i?(Ja), ct> + <5) where co > 0 and 0 < < a>. Let be the random interval 

n 

(x^ — ^(5, x^ + |<5). Let E be the event \J /|, and I the interval (0, a>). Show 
that 1-1 


^(E n /) = 


fl-/ 

1 * T1 

L' 1 

(o+6)_ 


[Robbins (1944^!)]. 


8.31 If n points are taken “at random” on a line of length L, show that if 
0 < d < Ll(n — 1), the probability is (L — (n — \)d)^lU^ that no two points 
will be closer together than the distance d, [Parzen (I960)]. 

8.32 If (x ^,..., a;,j) is a sample from a normal population having mean n and 
variance and if d is the sample mean deviation about // defined by 


show that 


^(d) = V2lna 


a\d) = 



8.33 Let a; be a random variable with p.d.f./(a;) having mean and variance 
a*. Let (fit) be the characteristic function of x. Let x and be the sample mean 
and variance of a sample of size n fromf(x). Show that if x and are independent 
for any n, then (p{t) satisfies the differential equation 





that its solution, subject to the boundary conditions (pi/d) = 1, and gp'(O) = is 

<p(t) = 


and hence that f(x) is the p.d.f. of the normal distribution N(/i, a*). [Lukacs 
(1942)]. 
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8.34 Suppose is the A:th order statistic in a sample of size m from a 
continuous c.d.f. F(x). In a second independent sample of size n from the same 
c.d.f. let y be the random variable denoting the number of a;’s in the second 
sample which do not exceed Show that the p.f. of y is 


m 


+ 7i\ (A: + 2/) ’ 


Im + 7i\ 

U +t// 


2 / = 0 , 1 , 


and show that the rth factorial moment of y is given by 
«!(*+/•-!)! _ ^ _ 


^(yW) = 


ik - 1)!(ot +r)!(n -r)! 


[Epstein (1954)]. 


8.35 (Continuation) Suppose the second sample is drawn one element at a 
time until y elements are less than Show that the size n of the second sample 
is a random variable having the following p.f. 


//n + « - l\ 
\2/ + A: - 1/ 


(w +«) 


n =y,y+ 1,. 
and show that 


^ (y - l)!(;fc - l)!(m-it)! 

r<k-\. [Wilks (1959&)]. 

8.36 Suppose a finite population ttn consists of ^ chips marked 1,2, ,.. yN 
respectively. In a sample of size n drawn from ny without replacement, let x 
be the largest number drawn in the sample. Show that the p.f. of x is 


and that 


t-!)/(«)’ . 


m = —ri + *>’ 

/I + 1 




a\x) = 


in + !)*(« + 2) • 


8.37 (Continuation) Suppose a sample of size 2/i 4- 1 is drawn from popula¬ 
tion TTN without replacement. Let y be the median of the sample. Show that the 
p.f. of y is 


y ^ n \y.,. yN — 
Show that 

^(y) = 




= (W-2n-l)(W + l) 
2 8n +12 
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8.38 {Continuation) If s and / are the Arith and ^ 2 ^^ order statistics (^i < ^ 2 ) 
in a sample of size n drawn from ny, show that the p.f. of (s, t) is 


Show that 




a\s) = 
Cov (j, t) = 


k^(N - n)(N + l)(/i - A:i + 1) 

(/f + 1)2(« + 2) 

— A:2 + 1)(^ — /i)(/i + 1) 
in + l)2(w + 2) 


8.39 Suppose and qix) are two discrete distributions having the same 
mass points , xj^. Let and On be independent samples of size m and n 
drawn from pix) and qix) respectively, and let (mj,..., m*), where 
mi + • • • + mfc = m, be the numbers of components of having values at 
a?i,..., aji; respectively, with a similar definition of (wi,..., nj^). If is the 

correlation coefficient between (Vmi/m,..., Vm^/m) and i'^njn .V nj^n), 

show that if /?(«) ^qix), and for an arbitrary A > 0, 



a result due to Matusita (1957). 


HM iContinuation) IfT(V'p(a;^) — > <5*, where <5 > A > 0, show 

that 

/ . t A: - 1 / 1 1 \a 

(<5-v(vm +v;;) 

a result also due to Matusita (1957). 

8.41 Show that the c.d.f. of z for the distribution described in Section 8.3(a) 


Fniz) 


and 


l[^ - (;)(«- ir + (;)(. - 2 )* - ■ + (-i)*(j)(« - ‘r] 

A?<2<A4*1, A = 0, 1,...,/i — 1 
Fniz) « 0 for z <0, 

1 for z > n. 


8.42 Suppose (o^d),..., Xin)) are the order statistics of a sample of size n 
from the rectangular distribution i?(J, 1). Let (wi, ...,«„) be an n-dimensional 
random variable denoting the segments (a^dj, a:( 2 ) — x^y ^,..., X(n\ “ ^(h-d) and 
let V s max (i/i,..., aj. Show that the c.d.f. of v is given by 


Fniv) 


1 - (")(i - vr + Qd - 2»)" - 





1 

k + 1 


1 

<a<^. 


A: = « — 1, « “ 2,..., 1, 
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and 

Fn{v) =0 for 1 / < 0 

= 1 for «> 1. 

(The result in the preceding problem is useful in solving this one.) 

8.43 (Continuation), Distribution of largest segment produced by n random 
points on the interval (0, 1). The order statistics (x^), ..., cut the interval 
(0, 1) into « + 1 disjoint segments i/j,..., Un+i where, of course, i/i + • • * + 
Mn+i = 1* Let w = max (ui, ..., «n+i)* Show that the c.d.f. of w is given by 

Fn(.w) = 1^1 - I - H-)" + (” 2 ‘)(1 - 2m.)"- 

+ (-!)*(" \ ^)(1 -*M.)»j 

A: =n, n - 1, ...,1 

and 

Fnbv) =0 for M. < —^ 

= 1 for w > I, 

8.44 If a;( 2 ), are the order statistics of a sample of size four from 

.Ar(0,1) and if y « i(x^^^ + x^^^) - i(x^^^ + x^ 2 )\ show that the c.d.f. G(y) of y 
is given by 

G(y) = ^ ^ ® [Walsh (1946)] 

8.45 If i*J( 2 )) are the order statistics of a sample of size two from 
N(jiJi, show that their mean values are Cm — a/ Vtt, /m -f cr/ Vn), 



CHAPTER 9 


Asymptotic Sampling Theory 
for Large Samples 


The sampling theory presented in Chapter 8 has been based on finite 
samples of size n. We now consider what kinds of results can be obtained 
about sampling distributions if we allow a 2 -> oo. In this case we have a 
population with c.d.f. F(x) and we consider a stochastic process Xg,...) 
such that the elements (components) are mutually independent and any 
set of n of them is a sample of size n from the population having c.d.f. F(x), 
This stochastic process is sometimes called simple random sampling from 
an infinite population or random sampling from a probability distribution. 
In this chapter we shall consider some limit theorems and results which are 
useful in approximating distributions of various functions of samples in 
the case of large samples. 


9.1 CONVERGENCE OF SAMPLE MEAN IN PROBABILITY 

One of the simplest results concerning a sample mean for large samples 
is given by the following theorem due to Khintchine (1929): 

9 . 1.1 If (xj,..., is a sample from a c.d.f. F{x) for which ^{x) has a 
finite value then the sample mean x converges in probability to p 
as n-> CO, 

To prove this, we note that if the mean p exists, then it follows from 
(5.1.10) that the characteristic function (p{t) corresponding to F{x) may be 
written as 


= i + + o{t). 

254 


(9.1.1) 
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Making use of the fact from 8.3.2 that the characteristic function of x is 
[ 9 ?(r//i)]", we have 

[,(!)]•= 

where n • o(tln) tends to 0 as n -► oo for any t. Allowing n oo, we have 


(9.1.3) 


lim 

n-»ooL \«/- 


which is the characteristic function of the degenerate c.d.f. e(x — /i) 
defined by (2.3.4). Applying 5.4.1 it follows that as « cx) the distribution 
of X converges to e(x — ju), which is equivalent to stating that x converges 
in probability to ^ as « —► cx), thus completing the argument for 9.1.1. 

If it is assumed that both the variance and mean fi of x exist and are 
finite for the population distribution in 9.1.1, then it can be proved quite 
simply by Chebyshev’s inequality (3.3.5) that x converges in probability 
to fji. For we know from 8.2.1 that (^(x) = /x and a^{x) = Applying 

(3.3.5) we may say, for any e > 0, that 

(9.1.4) P(|2 - fi\> e) = p(^\x - fi\> ^ 

From this it is clear that 


(9.1.5) lim P(|S-/i| > e) = 0; 

n**ao 


that is, X converges in probability to fx. 

The fc"dimensional analogue of 9.1.1 is straightforward. Its formulation 
and proof are left to the reader. 

Theorem 9.1,1 is sometimes called the weak law of large numbers and it 
essentially states that 100% of the probability in the distribution of x 
ultimately accumulates in any neighborhood containing the value x = // as 
n 00 . The condition which guarantees this accumulation is the existence 
(finiteness) of the mean //. 


The Cauchy Distribution. An example of a distribution which, at first glance, 
looks fairly well behaved but which does not produce sample means with this 
property of convergence in probability is the Cauchy (1853) distribution^ having 
p.d.f. 


(9.1.6) 


f{x) = 


^2 

^[el + (X - 


the range of x being (— oo, + ool), and being real numbers with 62 > 0 . 

It can be verified that the characteristic function 99 (f) corresponding to (9.1.Q 
is 


( 9 . 1 . 7 ) 


(p(t) 
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Hence the characteristic function of x for a sample of size n is 


(9.1.8) 





In other words, x has exactly the same characteristic function as x, which means 
that the sampling distribution of x for any n is exactly the same as that of x in the 
population. The reader should note that fix) is symmetrical with respect to 
X ^ di but that di is not the mean of the distribution; the mean does not exist. 
Nor does any higher moment. The value is, however, both the median and 
the center of symmetry of the distribution, whereas 02 is the interquartile range, 
that is, the distance between the quartiles ^ 0.25 and ^ 0 . 75 . 


9.2 LIMITING DISTRIBUTION OF 
SAMPLE SUMS AND MEANS 

(a) The Oae-Dimensional Case 

If the variance of the population distribution exists as well as the mean, 
then we can say more about the manner in which the distribution of x 
behaves as « 00 . The fundamental theorem for this situation which is 

due to Lindeberg (1922), can be stated as follows: 

9.2.1 If z and x are the sum and mean, respectively, of a sample of size n 
from a distribution having finite variance and mean p, then 

(9.2.1) 

^ J = < J f' 

n-+oo L yjno J w-»oo L O’ J J—co 

Whenever (9.2.1) holds, it is convenient to say that z is asymptotically 
normally distributed according to Ninp, na^) for large n (or x is asymp¬ 
totically normally distributed according to Nip, a^jn) for large n). 

Since (z — np)lVna is essentially an alternative notation for the 
random variable Vnix — p)la, it will be sufficient to consider the former. 
Let (pnit) be the characteristic function of (z — np)l'\/na. We have 

( 9 . 2 . 2 ) 

Vnit) = = .^[exp = [9’(0]" 

where is the characteristic function of (a: — fi)lVna. From (5.1.10) 
we have 

where n • o(t*/«) -*■ 0 as n -► oo for any given value of r 7 ^= 0 ; = 0 

if/ = 0 .‘ 
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Hence, for any given t, we have 

(9.2.4) lim = lim [l - ^ + 

W-+00 n-*oo L 2n \/l/ 



But, we know from 7.2.1 that e is the characteristic function 
associated with the normal distribution A^(0, 1). Therefore, it follows from 
5.4.1 that the c.d.f. of {z — nfjL)lVna as n -> oo converges to the c.d.f. of 
the distribution A^(0, 1), and this is equivalent to the statement contained 
in (9.2.1) which completes the proof of 9.2.1. 

The following is an important corollary of 9.2.1, known as the De 
Moivre-Laplace theorem: 

9.2.1a If z is a random variable having the binomial distribution Bi{n,p\ 
then z is asymptotically distributed according to N(np^ f^pcj)- 

It follows from the special case of 8.3.3a for w == 1 that if a sample 
of size n is drawn from the binomial distribution Bi{l,p), then the sample 
sum 2 : has the binomial distribution Bi{n,p). Then applying 9.2.1 we 
obtain 9.2.1a. Result 9.2.1a was surmised by De Moivre (1718), but it was 
not firmly established until a century later by Laplace (1812). Gauss 
(18096) discovered the approximation 9.2.1 by a rather heuristic argument 
in connection with the theory of errors. 

(b) The Central Limit Theorem 

Theorem 9.2.1 is a special case of a more general result known as the 
central limit theorem, which states that under certain conditions 


(9.2.5) 


lim P 



< y 



du 


where ajg,. ..) is a sequence of independent random variables with 

means Pi, • . and variances (af, o \,.. . ) respectively. Various 
studies of conditions under which this statement holds have been made by 
Chebyshev (1890), Feller (1935), Levy (1935), Lindeberg (1922), Lyapunov 
(1900, 1901), Markov (1900), and others. A comprehensive account of 
the central limit theorem and related problems has been given by 
Gnedenko and Kolmogorov (1954). A modern version of the central limit 
theorem in general form can be stated as follows: 


9.2.2 Let ^ 2 ^ • • •) be a sequence of independent random variables 
with c.d.f's (Fi(x), ...), means {p^, P 2 , • • * ), and variances 
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(of, o|, . . .). H-h *«. ®= ^-+ /«n. 

+ ••• + < 15 , and T** = + • • • + where, for arbitrary e > 0 

. /*#*!+**’« 

(9.2.6) <Tf**= {x-fi^fdF^x), f=l.«. 

A necessary and sufficient condition for 

(9.2.7) lim p(52Jzi» < y] *= -^ r* e-J“‘ du 

n-+oo \ / ^27rJ-oo 

»l/ial 

T*® 

(9.2.8) lim ^ = 1 

n-oo T* 

a/irf lim 7 ^ = 00 , for every e > 0 . 

n-*oo 

The proof of 9.2,2 is rather long and is omitted. The sufficiency of 
condition (9.2.8) was established by Lindeberg (1922), and the necessity 
by Feller (1935). 

The reader can verify with very little difficulty that if (ajj, a: 2 ,... ) is a 
sequence of independent random variables all having identical distri¬ 
butions with variance then (9.2.8) is satisfied. 


(c) The /r-Dimensional Case 

The /c-dimensional analogue of 9.2.1 can be stated as follows: 


9.2.3 


Suppose (a?!!,.. ., f = 1,. .., n) w u sample of size n from a 
k-variate distribution having finite means Pi,i= 1,...»/c, and 
(positive definite) covariance matrix r,y = 1,..., A:. Let 

(Zi,,Zj^) be the sample sums and (x ^,..., f *.) the sample means, 
as defined in Section 8.1. Then (z ^,..., «*) and (^i, ..., ^jt) have 


as their asymptotic distributions the k-variate distributions N({np^, 
llna^^ll) and j ^ ^ respectively. That is to say, 


(9.2.9) 


lim i = 1,..., fc\ 

n-»oo \ \jn / 

* lim P((a, — fti)y/n < i = 1,..., /c) 

n-+oo 

^ ^ f**... 

(2Tr)^* J-oo 



The argument for 9.2.3 is a direct extension of that for 9.2.1. The main 
thing to do here is to set up the characteristic function (pnOv .*.,/*) of 
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the A:-dimensional random variable — n/ii)lVn, ..., (z^ — n/i^lVn] 
and to show that 

(9.2.10) lim .t*) = exp (- i 2 

which, by the /:-dimensional analogue of 5.4.1 implies (9.2.9). The details 
of establishing (9.2.10) are straightforward and are left to the reader. 

One of the most important applications of 9.2.3 is the ^-dimensional 
extension of De Moivre’s Theorem 9.2.1a, which can be stated as follows: 

9.2.3a If (a?!!,..., | = 1,..., /i) is a sample of size n from the 

multinomial distribution M(\then the sample sums 
(zi ,,,, ,Zj^) have, as their asymptotic distribution for large n, the 
distribution N{{npi}, Wnip^d^^ — PiPM where is the Kronecker 
delta. 

The proof of this statement as a corollary of 9.2.3 amounts to verifying 
from the p.f. of the multinomial distribution Af(l;/?i ,... ^Pk) given by 
(6.3.3) that 

(9.2.11) S’iXf) = p„ a\Xi, X,) = Oil = p^dtj - p^p, 
and is straightforward. 

9.3 ASYMPTOTIC DISTRIBUTION OF FUNCTIONS 
OF SAMPLE MEANS 


(a) General Case 

It is sometimes important to know something about the asymptotic 
distribution of some Inunction of a sample mean x, say g(^), for large n. 
A us;ful result on this problem can be stated as follows: 

9.3 i Suppose {x^,.. . ,x^ is a sample from a distribution having mean p 
and variance both finite. Let g{x) be a function which has a first 
derivative g\x) in some neighborhood of the point x = p such that 
g'{p) 0. Then g(x) has N(g(p), [(yg{p)]^ln) as its asymptotic 

distribution for large n. 

To prove this statement, let V{pt) be a neighborhood olx ^ p such that 
g\x) exists for all values of x in V{p). Since the c.d.f. of the population 
from which the sample is drawn has finite mean p and variance a^, it 
follows from 4.6.1 that for arbitrary e > 0, there exists an n, such that for 
n>n, 

(9.3.1) 


P{x e V{p), for all « > > 1 — e. 
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For any point x =sx in we may write 

(9.3.2) g(x)^g(M) + g'(=c*)-ix-,x) 

where x* is a random variable such that la:* — //| < |x — ^1. But (9.3.2) 
can be written as 

(9.3.3) (g(x) - g(ft))Vn = g'ix*){x - n)Vn. 

Let the random variable on the left be denoted by «„ and that on the right 
by t)„. The probability that (9.3.3) holds exceeds 1 — e for all n > n,. 
Since e is arbitrary, the sequence of random variables (wj, u ^,...) and the 
sequence iv^, v ^,...) thus form a stochastic process such that (ux, Uj,...) 
and (ux, v ^,...) converge in distribution together if one of them converges 
in distribution. Since is finite, we know from 9.2.1 that the sequence of 
random variables 'v/«(S — //), « = 1,2,... converges in distribution to 
the normal distribution 7V(0, d^. Since g'(x) exists at all points of it 
is continuous in V(fi) and henc^at x — pi. It follows from 4.3.7 thatg'(**) 
converges in probability to g'{pi). 

Therefore 

(9.3.4) lim P(o„ < w) =» P(g'(ji)s < w) 

n-*oo 

where s has the distribution N{0, d‘), which implies that 

(9.3.5) lim P(u„ < w) = P(t < w) 

n-+oo 

where t has the distribution iV(0, But (9.3.5) is equivalent to 

the statement that the asymptotic distribution of g(x) for large n is 
N(g(M\ which concludes the proof of 9.3.1. 

The extension.of 9.3.1 to the fc-dimensional case can be stated as follows: 

9.3.1a Suppose ..., f = 1, ..., n) is a sample from a k- 
dimensional distribution with finite means {/u J and positive definite 
covariance matrix ||cr^J, Uj = 1, ..., fc. Let g{x-^,.,. be a 
function which possesses first derivatives dgjdx^ = g,., say, i = 
I,... ,k at all points in some neighborhood of . .., /z^), and 
let gl ss= giipi ,...» Pk)- Then if at least one of the g^ is ^ 0, 
g(^i,... yXj^ has the asymptotic distribution N{g{pi ^..., pj^, 
1 * 

; I <fug%f)for large n. 

"ij-l 

The proof is straightforward and is left as an exercise for the reader. 
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(b) A Quadratic Function of Sample Means and Sums 

It should be noted that (9.2.9) essentially states that we have a sequence 
of independent A:-dimensionaI random variables 


(9.3.6) 


1 ^1 - nui 
\ y/n 


V' 


n / 


or equivalently a sequence of independent Ar-dimensional random variables 


(9.3.7) ((a^i- jMi)Vm. • ••,(**;n = 1, 2, • • • 

which converges in distribution to the A:-dimensional normal distribution 
iV({0}, ||<r<,|l). For convenience, denote the c.d.f. of the random variable 
(9.3.6) [or (9.3.7)] for a given n by ..., y*) and the c.d.f. of the 
distribution iV({0}, H^iJ) by <5(yi,... ,y^), the form of which is given in 
(9.2.9). 

In 7.8.2 it was shown that if (z ^,... ,x,^) has the distribution A({^J, 

k 

IlCTjJ), then 2 — F,) has the chi-square distribution C(Ar). 

t.i=i 

If we set up the quadratic form 


(9.3.8) 


= i = n i a‘^(z, - - ^,), 

i, j'=i \ y/n /\ Jn / t, 


V' 


the sequence of random variables Q„, n = 1, 2,... converges in distribu¬ 
tion to the chi-square distribution C{k) as n -*■ co. For it follows from 
4.3.4 and 4.3.6 that since the stochastic process (9.3.6) converges in 
distribution to the A:-dimensional normal distribution JV({0}> lk<ill). then 
Cl. Ca. • • • converges in distribution to the chi-square distribution C(k). 
We can summarize as follows: 


9.3.2 If (zjf,..., Zjj; f = 1,..., n) is a sample from a k-dimensional 
distribution, with finite means pf, i = I,... ,k, and finite, positive 
definite, covariance matrix then Q„ as given by (9.3.8) has the 
chi-square distribution C{k) as its asymptotic distribution for large n, 
that is. 


(9.3.9) 


limP(e„ <y) = 


1 


2r(iA:) Jo \2, 




e du. 


It should be noted that Q„ is a function of z^.z*, say Q„(xi,..., z^), 

which violates the condition in 9.3.1a that at least one of its first derivatives 
evaluated at (pi,. ..,p^) is ^0. Yet Q„ has a well-behaved limiting 
distribution as n -► oo, namely, C(k). 
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(c) Pearson’s Gii-Sqnare Goodness-of-Fit Criterkm 


An important special case of 9.3.2 is that in which the A:-dimensional 
population being sampled has the multinomial distribution A/(l , 

for which the means and covariance matrix are given by (9.2.11). In 
this case it can be verified that 


(9.3.10) 


l|cr*'|| = WpAi - APiir" = 



1 

Pi 

Pfc+i 


Substituting WiiJPi + l/Pt+iH Pi for Pt (9.3.8), we find that 

Q„ can be reduced as follows: 

(9.3.11) e« = 2 - — . whereJ = n 

<-i npi f«i <=i 


Therefore we have the following corollary of 9.3.2, recalling from 8.3.3b 
that if a sample of size n is drawn from the multinomial distribution 
Af(l;the sample sums have the distribution 

M{n,pi, ...,p^: 


9.3.2a 

(9.3.12) 


If (^i ,, , , ,Zj^) is a k-dimensional random variable having the 
multinomial distribution M(n ; Pi, • • • then 

**i npi 


is asymptotically distributed according to the chi-square distribution 
C{k) for large n. 


Note that (9.3.12) is the sum of squares of discrepancies between the 
sample “frequencies” 2 , and their mean values, weighted inversely by the 
mean values, and is the well-known chi-square goodness of fit criterion 
originally introduced by K. Pearson (1900). It may be regarded as an 
index of the extent to which the depart collectively from their respective 
mean'Values, and the significance of the magnitude of this index must be 
established for a particular (large) sample in terms of probability computed 
from its asymptotic distribution, namely, the chi-square distribution C{k). 
Further consideration of this problem leads us into the theory of testing 
statistical hypotheses, which is treated in Chapter 13. 


9.4 ASYMPTOTIC EXPANSION OF DISTRIBUTION 
OF SAMPLE SUM 

Theorem 9.2.1 contains a statement of the limiting form of the distribu¬ 
tion of (z — ntA)lVna as n 00 . One problem which arises here is the 
determination, for large values of n, of a higher degree of approximation 
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to the distribution of (z — nfi)lVno than that provided by the distribution 
NiO, 1). We shall examine this problem for populations having p.d.f.’s. 

Suppose the central moments fii, i^z, ■ ■ ■ of a continuous c.d.f. 
F(x) exist and are finite. Then if <p(t) is the characteristic function of 
{x — ;m)/V na, we have 


(9.4.1) 


9’(0 = 1 - 


il + y 

In ^=3j!(^/n)^ 


where a, = But the characteristic function of (z — «(m)/V na, 

namely, is given by 


(9.4.2) 


9>n(0 = MOT- 


Taking logarithms, we find 

(9.4.3) log 9,„(0 = _ „ I 

2 i=sjl(yjny 


where the Kf are semi-invariants of the distribution of {x — /i)la in the 
population. Therefore, we have 

(9.4.4) <p„(0 = exp fn i . 

which can be written as 

(9.4.5, +1^;; 


where u^Qt) is a polynomial of degree 3j in (/7) whose coefficients are 
functions of the /c*’s but do not depend on n. The lowest power of (Jt) in 
M,0O isy + 2. 

If we let 

(9.4.6) F„(x) = 

\ yjna / 

and put a:' = (a: 4- y)/2, d = (x — y)/2 in (5.1.14), then for any x and y 
where x > y,v/e have 

(9.4.7) F„(a;) - F„{y) = i f” -LJ_L e-»tK»+>'>V„(0 dt. 

IT J-oo t 


Substituting the expression for q>„{t) from (9.4.5) and simplifying, we 
obtain 


(9.4.8) 


Fni^) - Fni.y) = - 

TT 


J-oo 2(-/0 



dt. 
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But it can be verified without particular difficulty that 

(9.4.9) ~ ^’ e“** dt = Oix) - <D(j/) 

7rJ-<x, 2(—lt) 

where 0(x) is the c.d.f. of the distribution N(0, 1). All other terms in 
(9.4.7) are of the following form, except for a constant multiplier, 

— I ” (-/OV""* - dt 

ItT J-oo 

which has the value 


(9.4.10) 

f" dt = d)‘^+iV) - $<'+%) 

J-oo Itt dy^ J-oo 

where 

( 9 . 4 . 11 ) = 

Hence, we obtain for (9.4.7) 

(9.4.12) F„{x) - F„(j/) = <D(a:) - <^iy) +1 

^=1 


where uf(x) and Wj^(y) are the functions one obtains by replacing (ity in 
UjOO by 0^^-*-^>(a;) and respectively. 

If we let y — 00 in (9.4.12), we obtain as the asymptotic expansion of 

(9.4.13) f’„(^) = ‘I>(a:) + |!^^\ 

^=1 y/n’ 

If F„(x) has p.d.f. /„(a;), we find, by taking the derivative of (9.4.13) 
with respect to x that 

(9.4.14) /„(*) = O'V) + i^\ 

i-i V" 

where «/'(*) tl'® first derivative of «*(x). 

As a matter of fact, even if F„(x) has no p.d.f. (that is, if F„(x) were a 
discrete c.d.f.), (9.4.13) can be formally established for points of continuity 
of F„(x). However, in this case the expression given by (9.4.14) would 
be meaningless. 

We shall not write out the general expression for the function uf(x). 
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But it is of some interest to write out the expressions on the right-hand 
sides of (9.4.13) and (9.4.14) to terms of order These are 

(9.4.15) F„(x) = <!)(*) - (f®)cD«>(*) 

y/n\3U 

-h -[i (a^ - ^ «i(I)«»(x)l 

n L4! 6! J 

~?[h. ^ «3(«4 - 3)a>‘’>(*) 

+ |^a?‘I>'''»(*)] +o(^) ••• 

and 

(9.4.16) /„(:«) = «l)^i)(a:) - ^(^)(I)<4>(;,) 

s/n\3U 

“ 3)d><®*(a-) + ^o^a)<^'(x) 

n L4! 6! J 

- - 3)0»>W 


We may summarize these results, which were originally obtained by 
Edgeworth (1905), as follows: 

9,4.1 If (aTj,..., XJ is a sample from a continuous p.d.f. with finite 
moments />«i, //a, ..., then the c.d.f. of {z — nfjL)lVna can 

be expanded in the form (9.4.13), the explicit expansion to terms of 
order n~^ being given by (9.4.15). 

The quantities ag (= /-ca/cr®) and a 4 — 3 (= — 3), usually denoted 

by and yg respectively, are regarded as indices of skewness and kurtosis 
respectively, of a distribution function having mean p, variance cr^, and 
third and fourth central moments and p^. These two constants play 
an important role in the degree to which the c.d.f. fn(^) can be approxi¬ 
mated by the c.d.f. ^(x) of the distribution N(0, 1). It will be noted from 
an inspection of (9.4.15) that, in general, FS^) is approximated by 0(a;) 
except for terms of order IjVn. But the following corollary of 9.4.1 gives 
conditions under which higher oj'ders of approximation hold: 

9.4.1a If the skewness of the distribution from which (x^,..., x J is drawn 
is zero, FJx) is approximated by <I>(a:) except for terms of order 
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I In; if both the skewness and kurtosis are zero, FJix) is approxi¬ 
mated by <I>(x) except for terms of order 

It should be pointed out that Lyapunov (1901) was the pioneer on the 
problem of determining higher degrees of approximation to the distribu¬ 
tion of X in large samples than that provided by the normal distribution. 
Cramer (1937) has shown that the remainder term in (9.4.13) and (9.4.14) 
is of the same order as the first term neglected. Esseen (1944) has made 
more recent investigations of the accuracy of such asymptotic expansions. 
Asymptotic expansions in powers of l/Vw have been established for 
other statistics than sample means by Cramer (1937), Hsu (1945^, 19456), 
Chung (1946), and others. An expository article on asymptotic approxima¬ 
tions to distributions with an extensive bibliography has been published 
by Wallace (1958). 


9.5 LIMITING DISTRIBUTIONS OF LINEAR FUNCTIONS IN 
LARGE SAMPLES FROM LARGE FINITE POPULATIONS 


In the preceding section we have considered the limiting distributions 
of sums and means of samples from infinite populations. In this section 
we shall consider limiting distributions of linear functions of elements of 
samples from finite populations as sample size and population size increase 
indefinitely. A general theorem of basic importance in dealing with this 
problem due to Wald and Wolfowitz (1944) can be stated as follows: 


9.5.1 Let («^i,.. . , a^v.v) • • • » ^ = 1, 2, ... 6^ two sets 

of sequences of real numbers such that for r = 3, 4, . . . and large N 


(9.5.1) 


and 




(9.5.2) 


where 




= 0{l) 


(9.5.3) m,,v(a) = ^ 2 (^a. - «v = ^ 2 ^Ai 

7Vi=i Nt = i 


with similar definitions of m^ ^{x) and Xy, For each value of N let 
(a?!,. . ., Xy) be a vector random variable whose sample space is the 
set of all N\ permutations of {xy ^,..., Xyy), Let 

N 


(9.5,4) 
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Then 

(9.5.5) = NOyXff 
and 

\72 

(9.5.6) o®(L^) = ^ ^ »» 2 .iv(«) • Wa.jv^a:). 

Furthermore; 


(9.5.7) 


— ^(L^) 


’)=:kL 


e-*“’ dw. 


The proof of 9.5.1 is straightforward but tedious. The procedure is to 
first set 


/i' = ~ 

Then the sets of sequences (a^j,.... a^jyr). (%i. • • •. %jv)> ^ = 1.2,... 
satisfy conditions similar to (9.5.1) and (9.5.2). If (*J,... ,Xj^ is the 
same permutation of (*^ 1 ,..., a;^jy) as (xj,..., Xj^ is of (ar^yi.a:^jy) 

and if we let 1-^ = 2 = 0 and 

(9 5 8 ) ~ ^(^n) 

<1^n) 

Omitting detailed moment computations, which are given by Wald and 
Wolfowitz (1944), it is found that for any positive integer k 


(9.5.9) 

= 

2''/c! 

‘ + o(N*), 

5 

= 2k. 



= 

= oiN% 

5 = 2fc -1- 

1 , 



and hence 







(9.5.10) lim.^’l 

f LW V 

= lim si 

1 

)- 

_ (2k)! 

s = 2k 

iV-00 

U(L'jv)/ 

w -00 ' 

^ or(L^) > 

J 

” 2*k!’ 



= 0, s = 2fc + 1. 


Thus as iV -► 00 the 5th moment of [Ljy “ Af'Af)]/<^(f-jv) converges to the 
5th moment of a random variable having the distribution ^(0,1). There¬ 
fore it follows from 5.5.3 that the limiting distribution of — <^(L^)]/ 
ff(Ljy) as 00 is W(0, 1). 

If, in the definition of Ljy as given in (9.5.4), we choose = 1/n, 
f = 1,..., « and aj^t = 0, i = n , N then Ljy is the mean of a 

sample of size n from a finite population whose N elements have 
*jvi> • • •, Xyy as their x-values. Furthermore, <?(I-^) = ^at where = 
1 ^ 

V and <t*(L^) = (1/m - llN)d%, where and are the mean 
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and variance of ir/f as defined in Section 8.S. In this case, (9.S.1) reduces 
to the condition that the limit of 


(9.5.11) 


(5 _,r-. 


N 


be finite for r > 3 as N,n-* oo. It is seen that this occurs if lim — = c 

N,n-*-ao W 

where +1 < e < + oo. Therefore we have the following corollary of 

9.5.1: 

9.5.1a Suppose = 1, 2,...} w a sequence of finite populations such 

that TT^ has mean and variance and let be the mean of 
a random sample of size n from Then if ..., 
iV = 1,2,...) satisfies (9.5.2) for large N and if N,n-^ oo in such 
a way that lim N/n = c where +1 < c < + oo then 


(9.5.12) limP/- 








=^r 

y/lTT J-a 




du . 


9.6 ASYMPTOTIC DISTRIBUTIONS CONCERNING 
ORDER STATISTICS 

(a) Limiting Distributions of Sums of Coverages 

In 8.7.2 it was shown that if (x^),..., x^^)) are the order statistics of 
a sample from continuous c.d.f. F(x), the random variable = F(x^J^^) 
has the c.d.f. 

(9.6.1) £>n(y(t))= ;; - -*)"-* da:. 

' ’ r(k)r(n - k + 1) Jo 

In fact, it should be borne in mind from 8.7.6 that (9.6.1) is the c.d.f. of 
the sum of any k of the « + 1 coverages F(Z(if), F(X( 2 )) — F(Z(i )),..., 
1 • - F(Z(„f). f’(X(H-*)) — F(Z(,f) is such a sum where 0 < j < « — k. Now 
consider the random variable w* = ny(t,. If we denote by the 

c.d.f. of Wj, we have 

= ^«(^) = II ^»(y) 


hniy) 


r(n + 1) 

n‘r(k)r(n - k + 1) 



(9.6.2) 

where 
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It is evident that as n -> oo the sequence h„{y\ n =: k, k + I,... 

converges to the function -p— uniformly over the interval (0, w^, 

and hence ^ ' 

(9.6.3) lim HM = ^ f dy. 

n-»oo 1 (fC) Jo 

Therefore we have the result that 

9.6.1 , ^{n)) (ire the order statistics of a sample from a popula¬ 
tion having a continuous c.d,f F(x), then for a fixed k, 

n = k, k + 1,. .. , w a sequence of random variables converging in 
distribution to the gamma distribution G{k). 

It should be noted that 9.6.1 holds when F(x^j^^) is replaced by the sum 
of any k of the coverages F(a:< 2 )) - ^(a^(i))»..., 1 - F{X(^)), for 

example 1 — or F(X(^j^j^)) — F{X(^)), Also, the statement is true 

if F(X(fc)) is replaced by any k coverages determined by a sample of size n 
from a population having a continuous multidimensional c.d.f. [see 
Section 8.7(c)]. 

Theorem 9.6.1 can be extended to the case of two or more sums of fixed 
numbers of coverages. In the case of two such sums the analogous result 
can be stated as follows: 

9.6.2 Suppose ..., x^^{) are the order statistics of a sample from 
a continuous c.d.f. F(x), Then if ki and /cg fixed integers, the 
sequence of pairs of random variables {nF{x^^j^f), nF{x^^^^^^^^, n = 
m, m + 1,.,. where m> k^ + k^ converges in distribution to a 
random variable having p.d.f. 

/(w-i, Wj) = wj'-Vz - 

in the region 0 < iVi < W 2 < oo, andf(w^, w^ = 0 elsewhere in the 
w^w^-plane. 


The proof of this statement is similar to the argument used in establishing 
9.6.1 and is left as an exercise for the reader. It should be particularly 
noted that 9.6.2 holds if Fix ^^.,) is replaced by the sum of any k^ coverages 
and by the sum of any k^ + k^ coverages which includes the first 

kx coverages. 

Now let us consider the limiting distribution of y^j,) as w 00 , not for 
a fixed k, but for an increasing sequence of values of k and n such that 
kjn = /^n jp + 0(1 In). Consider the sequence of random variables 
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- (»(«».) -/’)V«, « = m, »I + 1,... ; m > «/>„, where == 

^(*(ni>,))- Denoting the c.d.f. of »„ by we have 


(9.6.4) 


H» = Z)„(p + -^) 


where >) is defined in (9.6.1). Now D„{p 4- viVri) can be expressed 


(9.6.5) 

where 


J” hli 


’•“> '■•■w - H- D i" + ir‘(‘ - '■ - fj 


^nT{k)T(n -k+ 7 

r(n + l)(p + zUn)-^ 


[p’-d - p)'-*^]” 


7Mr(p„n)r((l - p„)n + 1) 


But since Pn= P + 0(11 n), we have 
(9.6.7) 


(1 - p)y/n} - 


(l + -7=y ”" = (l- - - + (piz, n)) 

\ pJnJ \ (1 — p)y/n/ \ 2mp( 1 — p) / 


2mp( 1 - p) 


where 99 ( 2 , n) is such that lim n • q)(z, w) = 0 for any value of 2 . By 

n-^oo 

making use of Stirling’s approximation (7,6,27) for r(g) for large values 
of g we find that 

(9.6.8) _ 


—=--Tp'-d - p)!-"-]" = 

7Mr(p„n)r((l - p„)n + 1)^ 

Finally we obtain 

i.:w=(p+^j)‘(J=^+o(i) 

which converges uniformly to the function 


2n{l - p) 


+ 0 - 1 - 


2np(l - p) 


<piz, n)j 


(9.6.10) 


h*(2) = —■--- ■ e 

72 irp(l - p) 


-**/2®a-j>) 


in any interval {—K, u) as n -► oo. For any e > 0 we can choose K and «i 
so that for n> rii 

(9.6.11) and | <5 
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Furthermore, there exists an > K^jp^ such that for any v and n > 


(9.6.12) J h*(z)dz-‘j h*(z)dz 


< 


Therefore, for n > max (n^, 


(9.6.13) 
and hence 


f h*(z)dz^r hliz) 

•f-co •'-pVn 


dz \ <e 


(9.6.14) lim p( -— < w) = f e du. 

n-*oo \ ^/^(l “ P) / ■yJ'l.TT^—co 


yJpiX " P) 

Summarizing, we have the following result: 


9.6.3 If (a:(i),..., x^^)) are the order statistics of a sample from a con¬ 
tinuous c,d,f F(x\ and np^ is an integer such thatp Oiljn), 
where 0 <P<U then for large w, is asymptotically 

distributed according to N^p ^—— 


It is to be noted that 9.6.3 holds if F{x^^j^J) is replaced by the sum of any 
np^ coverages in one or more dimensions. 

The statement can be extended without significant difficulties to the 
case of several sums of coverages. It will be sufficient to state the result 
for two sums, leaving the proof to the reader. 

9.6.4 If ..., a;(„)) are the order statistics of a sample from a popula¬ 
tion having a continuous c.d,f F{x) and if npi^ and np^n are integers 
such that Pm = Pi + 0{\ln\ p^ = P 2 + O^ljn) where 0 <pi< 
P 2 < U then for large «, the two-dimensional random variable 

’ I ^ jj) asymptotic distribution 

for large n, where cr^ = /7i(l — p^, = ffai = — P^y and 

= pf \ - p^. 


(b) Limiting Distributions of Order Statistics 

The limiting distributions obtained in 9.6.1, 9.6.2, 9.6.3, and 9.6.4 
referred directly to distribution properties of sums of coverages and only 
implicitly to large-sample distribution properties of order statistics them¬ 
selves. Since the population distribution F(x) was assumed to be contin¬ 
uous, it is possible to make some statements about the limiting distribution 
of the order statistics themselves. For instance, in 9.6.1, if F”^(y) is the 
inverse* of F(x), suppose we consider a sequence of values of n for which 
F~\kln) has a unique inverse. Since F(x) is continuous and defined for 
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ail real values of it is evident that there exists an infinite sequence of 
such values ofsay rii, Wg*_Then it follows from 9.6.1 that for fixed k, 

(9.6.15) Ita JV-.- 

and hence for large n, 

1 rnFiv) 

(9.6.16) < <^) = dy. 

1 (k) Jo 


Formulas analogous to (9.6.15) and (9.6.16) for for fixed k' can 

be similarly established. Asymptotic results of this type, particularly for 
A: = 1 (the smallest order statistic) and k' = 1 (the largest order statistic) 
have been considered in detail by Dodd (1923), Fisher and Tippett (1928), 
Frtehet (1927), Gumbel (1935, 1958), and Smirnov (1935). 

Similarly, from 9.6.3, we may write 


(9.6.17) lim P 

i-* CO 




< F-* 




dy 


and for large n, 

(9.6.18) < «) ^ ^ r dy 

y/ZTT J-~(X> 

where 


^ (F(v)-p)Jn 

Vp(i - p) 

Formulas (9.6.16) and (9.6.18) can be extended without further important 
difficulties to two or more order statistics. In the two-dimensional case 
the results follow directly from 9.6.2 and 9.6.4, 

It is evident from (9.6.14) that as oo, converges in proba¬ 

bility to the constant p. Now suppose F{x) = p has a unique solution, 
that is, that a unique /?th quantile exists. Then since F{x) is continuous, 
X{np^) will converge in probability to x^^. Therefore we have the following 
result: 


9.6.5 If in addition to the assumptions of 9.6.3 we add the condition that 
a unique pih quantile Xj, exists, then as n-^ co, converges in 
probability to x^, 

A similar statement can be made for the case of the two order statistics 
^(npin) ^(npa„) l^^gcr Specified number of similar order statistics. 

Now suppose P(a:) has a derivative/(x) in some neighborhood V{x^ of 
the point a; = such that f{x^ > 0. Then a unique pih. quantile Xp exists 
and hence, by 9.6.5 x^^^p^^ converges in probability to Xp, Now if x^^p^^ is 
any point in V(Xp) we may write 
(9.6.19) F(a:,„,^,) = p +/(**)(*,„„,, - x^) 
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where x* is a random variable such that I** — x^\ < \x^„^ , — xj. But 
since converges in probability to the quantile x^, as n ^ oo, then for 
an arbitrary e > 0, we can choose an such that 


(9.6.20) 7’(f’(X(„p„)) = p +/(**)(a;(„p„) - £p) for all n > «,) > 1 - e, 


which amounts to stating that 


(9.6.21) 

and 


- p)\/» 


n = m, m 1,..., m :> np^ 


(9.6.22) - . .. ^ , n = m, m + 1,..., m > np,, 

- P) 


form a stochastic process such that the two sequences converge in distri¬ 
bution together to the distribution A^(0, 1). 

The fact that F(x) has a derivative f{x) in V{x^) implies that f{x) is con¬ 
tinuous in V{:x,^) and hence at x = x^^. Therefore, by 4.3.5, con¬ 

verges in probability to the constant/Cr^ and by 4.3.7/Or*) also converges 
in probability io [(x^) as n oo. Finally, by applying 4.3.3 we see that 
the sequence of random variables (9.6.22) and the sequence 


(9.6.22a) zA JJL , n = m,m+\,..., m > np„ 

s/pO - P) 

constitute a stochastic process such that the two sequences converge 
together in distribution to the distribution A^(0, 1). 

Summarizing, we have the following result: 

9.6.6 If in addition to the assumptions of 9.6.3 irc add the condition that 

F(x) has a derivative f{x) in some neighborhood y(x^,) of x such 

that f{xf) > 0, then, for large n, is asymptotically distributed 

I - py\ 

accordingly 

Example. An interesting special case of 9.6.6 arises for p — In this case, 
sample median and we see that, for large /i, its asymptotic distribution 

is ^( 3 ^ 0.51 t~A -) • population distribution is A(//, then the sample 

median has Nin, Tm^jln) as its asymptotic distribution for large n, 

A statement similar to 9.6.6 holds, of course, for the joint distribution 
of two or more order statistics. In the case of the two order statistics 
^oip,n) we can say that 

9.6.7 If in addition to the assumptions of 9.6.4, we add the condition that 
F{x) has a derivative f(x) in neighborhoods of each of the points 
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x=‘Xj,^andx ^,^,0 </>i </72 < 1, such thatf{x^) > 0 andfiXj,^ > 0, 
then, for large n, the random variable *(»p„)) « asymptotically 

distributed according to ^ i,y = 1,2, where 

Pi(\ - Pi) _> __ Pii^ - Pi) -PjLzl^ 

“ /*(*,.) ’ ■ 
Theorems 9.6.6 and 9.6.7 were established by Smirnov (1935), although 
the formulas for the variances and covariance of ^(npj„) and '^cre 
originally established by K. Pearson (1920). Mosteller (1946) has extended 
9.6.7 to the case of several order statistics. 

PROBLEMS 

9.1 State and prove the ^-dimensional version of 9.1.1. 

9.2 If (a ?!^,.. . == 1,, .., n) is a sample from a ^-dimensional distri¬ 

bution having (finite) means (/ij,..., /x^) and (finite) covariance matrix ||cr,jl| and 
if (^i,..., ^fc) is the vector of means of the sample, show that for arbitrary 
di >0, i = 1,... ,k, 

PQXf -^t\ <6i,i ^ . .,k)> I - 

and hence that (iCi,..., «*) converges in probability to (//i,..., 

9.3 (Continuation) Show that for arbitrary <5^ > 0, 

where |la*^|l = and hence that (»i,. .., converges in probability to 

(f*i, • . •, 

9.4 Prove 9.2.3, 

9.5 If (a^i,.. ., a; J is a sample from a Poisson distribution Po(/4), show that 

for large n, 2 Vf has as its asymptotic distribution A^(2 l/«). Show that the 

same result holds if (arj,. .., a:J is a sample from the gamma distribution G(n). 

9.6 If (a^i,. . ., a;„) is a sample from the binomial distribution Bi(\,p) show 
that the asymptotic distribution of sin~^ (2x — 1), for large w, is 

Ar(sin-i (Ip - 1), l/n). 

9.7 If (a^i,.. ., is a sample from the waiting-time distribution having p.f. 

a: = 1, 2,. .. 

where 0 < p < \, p -f ^ = 1, show that the asymptotic distribution of 
log [^(1 -f Vi — i/ij) — for large /i, is 
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9.8 If (a?!,..., is a sample from rectangular distribution BX 
show that the asymptotic distribution of Vl2 log (2x) for large /i, is 

N(VT2 log 0, 4ln). 

9.9 If X is the mean of a sample of size n from the rectangular distribution 

/?(J, 1), determine the expansions (9.4.15) and (9.4.16) for the c.d.f. and p.d.f. of 
the random variable (x up to terms of order rr^, 

9.10 If (a^i,. . ., a:J is a sample from a distribution having finite mean n and 
variance show that the sample variance converges in probability to cr*, 
and also that the asymptotic distribution of V n{x — fi)ls, for large w, is iV(0,1). 

9.11 If Wi, si are the size, mean, and variance of a sample from a distri¬ 
bution having mean and variance af, whereas s\ are the size, mean, and 
variance of an independent sample from a distribution having mean and 
variance show that the limiting distribution of 


(x 1 - x^) - (/ii - //.^) 


/- 

^2 

+ ~ 

V nj 

«2 


as Wi, /i 2 -► 00 , is N(0, 1). 

9.12 tf X is the median of a sample of size n from a continuous c.d.f. F(x\ 
show that the asymptotic distribution of Fix), for large n, is l/(4n)). 

9.13 If .. ., x^^0 are the order statistics of a sample from a continuous 

c.d.f. F{x), show that the limiting distribution of — J dF{x^ as « co 

is the gamma distribution G(2). 

9.14 If (^1,. . ., ^2n-n) is a sample from a distribution having p.d.f. 

.r > 0, A > 0, show that, for large «, the median x of the sample has 

7V(log 2/A, l/(2A2/i)) 


as its asymptotic distribution. 

9.15 If X and x are the mean and median of a sample of size n from a 
population having the distribution yV(/<, show that the asymptotic distri¬ 
bution of (x, x) for large n is 


N 


('■•'‘•I ^11) 




ct2 

(t2 



n 

n 

n 






n 



where 
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9.16 {Fisher^ s (1925a) transformation of the correlation coefficient). It is known 
that the distribution of the sample correlation coefficient r in samples from a 
two-dimensional normal distribution having correlation coefficient p has, as its 
asymptotic distribution, for large «, the distribution A^(p, (1 — p^j^ln). Show 

that the asymptotic distribution of J log for large n is 

9.17 Prove 9.6.2. 

9.18 Prove 9.6.4. 



CHAPTER 10 


Linear Statistical Estimation 


10.1 INTRODUCTORY COMMENTS 

In Chapters 8 and 9 we have presented some results on the theory of 
sampling, that is, probability theory of certain functions of the elements or 
components of a sample from a given c.d.f. F(x). In problems of applied 
statistics, F(x) is usually unknown and the main purpose of sampling is to 
acquire information on the basis of which statements or inferences can be 
made about F(x), or some of its properties. These statements are made in 
terms of functions of the elements (components) of a sample and ex¬ 
pressed as probability statements. Assumptions which can be made about 
F(x) in advance of any sampling can range all the way from those stating 
that F(x) satisfies only the basic properties of a c.d.f. given in 2.2.1 to a 
complete specification of F(x) for all values of .r in R^. In general, assump¬ 
tions which can be made about F(x) in any given situation lie between 
th'ese extremes. For instance, it might be assumed that F(x) has a finite, 
but unknown, mean /jl and variance o^, the problem being to devise 
estimators for fx and as functions of the elements of a random sample 
from F{x). Thus, in Section 8,2 it was shown that the sample mean x 
and the sample variance s^ in samples from infinite populations are 
unbiased estimators for and that is, S(x) = fi and = d-. 

More generally, the problem is to make inferences about a population 
distribution function beyond the assumptions made concerning the dis¬ 
tribution function by utilizing information in a sample from the distribu¬ 
tion. 

It is sufficient for many problems in applied statistics to devise from 
samples unbiased estimators for parameters of population distributions 
such as means, variances, covariances, and regression coefficients involved 
in the distributions. A class of relatively simple estimators of such para¬ 
meters, known as linear estimators^ can be devised as linear forms of the 
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sample elements and involves no stronger assumptions about the popu¬ 
lation c.d.f. than finiteness of first- and second-order moments of the 
components of the sample. There are, of course, other classes of esti¬ 
mators, but these are considered in Chapters 11 and 12. In general, an 
estimator for a population parameter 6 is an observable random variable 
determined from a sample, that is, one which is a known function of 
sample elements which is used in place of the unknown true value of the 
parameter which it estimates. In devising an estimator for a parameter 0, 
it is important to construct it from the sample so that its distribution 
concentrates as much as possible in some sense around the true value 0^ 
when calculated from samples from a population in which 0 actually has the 
value Oq. As we shall see a fairly natural criterion for measuring this 
concentration in the case of linear estimators is the variance of the 
estimator. In this chapter we shall confine our attention to the theory of 
linear estimators, to criteria for constructing them, and to their appli¬ 
cation to some of the more important statistical problems. Similar 
consideration is given to other classes of estimators in Chapters 11 and 12. 

As has been pointed out in Sections 8.2 and 8.5, the sample mean, 
which, of course, is a linear function of the sample elements, is an unbiased 
estimator for the population mean in both infinite and finite populations. 
But the sample mean is only one of many possible unbiased linear esti¬ 
mators of the population mean that we could devise. For example, if 
(«!,. . ., x,^) is a sample from a population (finite or infinite) with mean p, 
then it is evident that a^Xi -f • • • -p is an unbiased estimator for 
p if ^ 1 , . . ., ^^ are known constants such that + * • * + = 1. 

Estimators of this type are called unbiased linear estimators. If one is to 
select an unbiased linear estimator for p, one must consider what criteria 
to use in making the selection. One widely accepted criterion is to choose 
that linear estimator from all unbiased linear estimators having the smallest 
possible variance. Such an estimator will be called a minimum variance 
linear estimator. 

Similarly, if {x^ .is a sample from a finite or infinite population 

having mean p and variance or-, any quadratic form 2 

whose mean value is a- is an unbiased quadratic estimator for o^. If a 
unique unbiased quadratic estimator for a- exists having smallest possible 
variance, it will be called the minimum variance quadratic estimator for d^. 
However, it should be noted that any quadratic estimator is a linear 
function of quadratic terms, and much of the theory of linear estimation 
to be developed in this chapter applies to quadratic estimators. 

Minimum variance linear estimators themselves usually have variances 
which can be estimated without bias by quadratic estimators. 

In this chapter we shall deal with the theory of linear estimation and 
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its application to the estimation of means and variances of population 
distributions. In these problems no assumptions are required about the 
distribution of the random variables used in the linear estimation process 
except finiteness of means and of the elements of the covariance matrix 
of these random variables. The underlying random variables for quad¬ 
ratic estimators are quadratic forms in the sample components. 

It will be sometimes convenient to use the notation to denote an 

unbiased estimator for 0. Thus, if T is an observable random variable 
which is an unbiased estimator for 0, that is, if 

s{T) = e 

we may write 

^~\0) = T. 

A simple but basic theorem in linear estimation theory which will be 
used repeatedly in dealing with linear and quadratic estimators is the 
following: 

10.1.1 Suppose (xj, , . . are random variables whose mean values are 

= 2 ' = 1 .- k 

3 

where ... are unknown parameters and where ||a,y|| is a 
nonsingular matrix whose elements are known {that is, do not depend 
on the parameters 6^,..., 6^). Then 

i = I . k 

3 

are unbiased linear estimators for ... ,0^ respectively, where 

The proof is left to the reader. 

10.2 MINIMUM VARIANCE ESTIMATORS FOR THE 
MEAN AND VARIANCE OF A POPULATION 
FROM RANDOM SAMPLES 

(a) Minimum Variance Linear Estimator for the Population Mean 

Suppose . . . , is a sample from a distribution having mean p 
and variance We have seen in Section 8.2 that the sample mean is an 
unbiased estimator for p,. We now show that x is the minimum variance 
linear estimator for p. Let S~\p) be any unbiased linear estimator for p, 
that is, let 

(10.2.1) === ^1^*1 + * * * + 
where 
(10.2.2) 


+-h = 1- 
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For the variance of we have from 3.6.1a 

(10.2.3) = (of + • • • + 

It is seen that a\S~^{ij)) has a unique minimum which occurs for 

(10.2.4) , -a„ = i. 

n 

But for this choice of values of the ^I’s, is identical with x. Therefore 

10.2.1 //(xj,. .., is a sample of size n from a distribution having mean 
fjL and variance then x is the minimum variance linear estimator 
for fi. 

It can also be verified that the same conclusion holds if (ajj,. . ., a; J is 
a sample from a finite population. The proof is left to the reader. 

(b) Minimum Variance Quadratic Estimator for the Population Variance 

To deal with this problem of quadratic estimators it will be convenient 
to consider first the following theorem from some general results on 
unbiased estimation theory by Halmos (1946): 

10.2.2 Let (a?!, ,x^) be a sample from a c.d.f. F{x) and let gix-^,.., ,xj 
be any statistic having mean 0 and variance a^<+co. Let 

• • • > ^n) {suitably indexed) of all n ! permu¬ 
tations of the integers and let gi{Xi ,..., = 

j n! 

g{Xi a;,). If g = —y gfx-i, then S\g) = 0 and the 

^ n\i^\ 

variance of g is smaller than that of , a;„) unless 

g{xi ,..., xf) is symmetric in aj^, ..., a:„ with probability 1, in 
which case g is identical with g{x ^,..., xf). 

To establish 10.2.2, we first note that since {x ^,.. ., a: J is a random 
sample from F{x) we have ^(gi) = 6, / = Therefore <?(g) = 0. 

Furthermore, cr^(g,) — a^,i^ 1,..., n!, and 

(10.2.5) Ai) = -^ a* + (-^T 2 COV (ft, ft). 

«! \n\/i*i 


But COV igi, g^ < cr*. Hence 

(10.2.6) d^) < a\ 
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But equality in (10.2.6) holds if and only if^, = a + bg^ with probability 1 
in the sample space for all / ^y, where a and b are constants which 

must have the values 0 and 1 respectively. Hence 

(10.2.7) . . ., = g^{x ^,.. . , ^J 

for all points . . ., a;„) in the sample space (except possibly for a 
set of zero probability) and for / ^y = This condition implies 

that g{x ^,. . ., is symmetric in (x ^,..., xj. The conclusion of 10.2.2 
follows at once. 

The following corollary of 10.2.2, states that under certain mild con¬ 
ditions the sample variance is the minimum variance quadratic estimator 
for the variance of the population distribution. The proof is a straight¬ 
forward application of a slightly extended version of 10.2.2 described 
below and is left as an exercise for the reader. 

10.2.2a If (a?!, .. . , x^^) is a sample from a c.d.f. F(x) having mean p and 
variance and finite third and fourth moments^ if {a^^ is any set 

of constants for which 

n 

Q = 2 - *)(*, - *) 

has mean the values of the {a^^} for which Q has minimum 
variance are a^^ = Xjin — 1), ^ = 1,..., a^,j = 0, f in 
which case Q reduces to the sample variance s^. 

The reader should note that 10.2.2 holds with only minor changes in 
the argument if the assumption that (ar^,..., x^) is a random sample 
from F(x) is replaced by the assumption* that (x^,. , ., x^) is a vector 
random variable whose c.d.f. , ^n) is symmetric in .. ., x^. 

With this extended version of 10.2.2, it is seen that the following version 
of 10.2.2a states that in sampling from a finite population the sample 
variance s^ is the minimum variance unbiased estimator for the population 
variance a^, 

10.2.2b If (xj^, . . . , is a sample from a finite population having variance 

and if is any set of constants for which 

n 

has mean the values of {a^f^ for which Q has minimum variance 
are those given in 10.2.2a, in which case Q reduces to the sample 
variance s^, 

ic) Interval Estimators for p and in a Normal Distribution 

If we make the further assumption that (aj^,... ,xf) is a sample from 
the distribution N{p^ o^), then we have a special case where we can obtain 
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what are known as interval estimators for and d^. This case deserves 
special attention. We know from 8.4.3 that V — fji)ls has the “Student” 
distribution 5(w — 1). This fact enables us to make the following tatement: 

(10.2.8) P{ti < ^n(x - fi)ls < (jj) = j dt = y 

Jti 

where/n_i(/) is given by (7.8.4), and t^ and t^ are cho so that the integral 
in (10.2.8) has the value y. But the statement 

p(.. < <-.) = r 


is equivalent to the statement 


(10.2.9) 



t2-^ < JLt < X — 

yjn 



y- 


Thus, (x — t 2 (slVn), X — /i(^/A/n)) is an observable random interval 
such that the probability is y that it includes the point //. The interval is 
called a 100y% confidence interval for //, and y is called the confidence 
coefficient. It is an example of a method of setting up an estimator for a 
parameter by using a (realizable or observable) random interval with a 
specified probability of including the “true” value of the parameter. 
Such intervals are sometimes called interval estimators. We shall defer a 
discussion of interval estimation under more general conditions until 
Chapters 11 and 12. 

In the particular example above the length of the interval is (?2 ^ 

which is shortest for a given s if, for the given y, t^ and are chosen 
so that t 2 = —/i = y where ^n-i,v satisfies 


( 10 . 2 . 10 ) 



7 


in which case the lOOy % confidence interval for fx is the following interval 
centered at x\ 


( 10 . 2 . 11 ) 

V" 

Similarly, if (* 1 , ...,*„) is a sample from a population having the 
distribution N{n, <r*) we can set up an interval estimator for o® from the 
fact that (« — l)j®/(T® has the chi-square distribution C(n — 1) (see 8.4.2). 
For we have 

(10.2.12) p{xl < = y 
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where is given by (7.8.1), so that the integral in (10.2.12) has the 

value y. But (10.2 12) is equivalent to 

^2 M ' 

from which it is evident that ({n — l)s^lxt (« is a 100y% con¬ 

fidence interval for a^. There are many ways, of course, of choosing Xv 
and xl to satisfy (10.2.12). In practice, they are usually chosen so that 

(10.2.13) ) = f"<iF„_i(/) = 

although the confidence interval with shortest mean length is obtained if 
(llxi — ^Ixi) is minimized subject to the condition that (10.2.12) be 
satisfied. 


10.3 ESTIMATORS FOR PARAMETERS IN 
LINEAR REGRESSION ANALYSIS 

(a) Estimators for Regression Coefficients 

We shall now consider a generalization of 10.2.1 which arises in esti¬ 
mation problems of regression analysis, experimental designs, and related 
problems. Suppose 2 / 1 ,...» ^ ^ independent random variables 

having variances all equal to but with means given by the regression 
function 

(10.3.1) ‘ 

f = 1,.. ., /I where the (arj^,..., x^.^), | = 1,. .., n are known (real) 

vectors but ... , are unknown (real) parameters, called regression 

coefficients, to be estimated. The parameter cr^, usually unknown, is 

called the residual variance. It is convenient to introduce 

and refer to them as fixed variables as contrasted with random variables, 

in which case (X||,. . ., x^.e), f == is a set of « specified values 

of these fixed variables. It is customary to take = 1,1= 

but it will be convenient not to make this assignment at present. We 

will show that under mild conditions minimum variance estimators exist 

for ^ 1 ,.. . , and a rather simple unbiased estimator exists for a^. 

First, consider the estimation of ..., pi^. Let Hft) be an arbitrary 
unbiased linear estimator for p^, that is, 

(10.3.2) ^-*(ft) = 2 i = 1,..., fc. 

Then 

(10.3.3) = 22 = ft- 

I i 


i = 1.fe, 
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from which it is evident that the must satisfy 

(10.3.4) = 

where is the Kronecker d. The variance of is given by 

(10.3.5) = 

I 

Minimizing ff*(^“^()3,) with respect to the c,j, subject to condition (10.3.4), 
one finds that c,| must be of the following form 

(10.3.6) i = 1,.k, I = 1,. .., 

3 

where it will be seen that the must satisfy the equations 

(10.3.7) = 

y 

where 

(10.3.8) = 2 

If the matrix \\ay^\\ is nonsingular, which will be true if and only if 
. •. > / = 1,..., A: are linearly independent vectors, then it is 

evident that the solution of (10.3.7) is 

(10.3.9) II A,, II = ||a,,ir^ = ||fl‘'||. 

Therefore, the minimum variance linear estimator for which we denote 
by 6,-, is 

(10.3.10) b, = 2 

i 

where ajo,J = 1,..., k are random variables defined by 

(10.3.11) Ojo = 2 = Ooi- 

To find the variance of b^, we substitute from (10.3.9) into (10.3.6) 
and, in turn, substitute into (10.3.5). This gives 

(10.3.12) a\bi) = 2 ^ 

2.J' = 1 3 

One similarly finds the covariance matrix of the / = 1,..., A:, to be 

(10.3.13) ||a*V||. 

Summarizing, we have the Markov (1900) theorem: 

10.3.1 Suppose ?/^, f = !,...,« are independent random variables with 

k 

means 2^ f = 1 A: and with variances all equal to 

(T*, where (ar,i,..., a:,„), / = 1,..., A;, are known and are linearly 
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independent vectors. The minimum variance linear estimators of the 
regression coefficients are b^^i — ,,k, where the are defined 

by (10.3.10). The covariance matrix of the b^ is where a^^ = 

= 1,... ,k, and ||a^^|| = ||ay|l-i. 

If we minimize the sum of squares 

( 10 . 3 . 14 ) Q=^l(y,- - • • • - M 

with respect to A» • • • > that if is nonsingular, the least 

squares estimators for the A is, the values of the A which minimize 
(10.3.14) are the A- The details are straightforward and are left to the 
reader. Hence we have the following theorem on linear estimators for 
regression coefficients: 

10.3.2 Under the conditions of 10.3.1, the minimum variance linear 
estimators of the regression coefficients A cire identically the same 
as the least squares estimators of the A- 

The minimum variance approach to the estimation of the A is due to 
Markov (1900), whereas the much earlier least squares method is due to 
Gauss (1809^2). The combination of 10.3.1 and 10.3.2 is commonly called 
the Gauss-Markov theorem. 

In many regression problems = 1, f = that is the mean of 

is assumed to be A + A^ 2 i -f * • * + A^fcf- 1*^ li'is case, by applying 
10.3.2, we see that the minimum variance linear estimators A for the A 
can be expressed as follows: 

bi = y — ^ 2^2 — bj^Xj^ 

be = 2 r = 2, . . . , k, 

j' = 2 

y = - 2 *i’ = - 2 '■' = 2. k, 

ns ns 

Ai'f = 2 (^i's - - *>') 

I 

Aro = 2 i^i'S - ^rXVs “ y) 


(10.3.15) 
where 

and 

(10.3.16) 

and 

(10.3.17) 


i'J'= 2 . k 
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Furthermore, by putting = 1, f = 1,..., w, in one finds from 


(10.3.13) that the covariance matrix of the ft,-, /' = ; 

(10.3.18) 


«•',/ = 2. k 

whereas 



(10.3.19) 

II 

iH 

- + X Xy 

Ln =2 

“ A: 1 


<t(6i, bf) = 

J,J^'>'xy\a\ 

Lr=2 J 


Remarks on Gauss-Markov Theorem and Weighing Problems. Suppose in 
(10.3.1) that Pi,.... Pk represent the “true” weights of k objects, say, Oi,..., 
respectively. Consider weighing various combinations of these objects on a 
chemical balance (scales with right and left weighing pans). Let = H-1 if 
Oi is placed on left pan, x^t = —1 if is placed on right pan, and x^ = 0 if 
Oi is not weighed, / = 1,. .., A. Then PiX^^ + • • -!- p^^jc^ is the “true” reading 
on the ^th weighing of the set of objects in accordance with the configuration 
determined by x ^^,..., For n different weighings the weighing design matrix 
/ = 1,..., A:, f = 1,..., w, consists only of —I’s, O’s, and +rs. 

If the scales are bias-free the “actual” reading on the ^th weighing may be 
considered as a random variable whose mean value is Pi^i^ + • • • + Pi^jcv 
If we assume that i/i,..., 2/n are the n random variables one obtains in n 
weighings, and if we assume these random variables to be independent with 
equal variances, namely, a^, then the minimum variance estimators , 6*. 

of the “true” weights Pi, •.., Pk of the objects Oi,,,, ,oj^ are given by (10.3.10) 
and the covariance matrix of these estimators is given by (10.3.13). Note that 
k is the smallest value of n for which it is always possible to find a weighing 
design matrix which will yield estimators for all /?’s. The problem of constructing 
weighing design matrices so as to provide vector estimators {hi,... ,bk) for 
(Pu • • • ^Pk) in various “best” senses has been extensively investigated by 
Hotelling (1944), Kishen (1945), Mood (1946), and others. 


(b) Estimator for the Residual Variance 

Now we consider the problem of constructing an unbiased quadratic 
estimator for the residual variance Let 

( 10 . 3 . 20 ) = 

f=l 

( 10 . 3 . 21 ) iff-'I, 

f=i 1=1 

^2 = 2 ““ 

1 ,^ = 1 


( 10 . 3 . 22 ) 
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It is seen that 

(10.3.23) S= S^+ Sg. 

But 

(10.3.24) ^(S) = na^ 

and making use of the covariance matrix (10.3.13) we have 

(10.3.25) = i = ka^ 

ij = l 

Since ^(5) = 4- <^(S 2 ) we therefore obtain 

(10.3.26) (f(Si) = (n - k)a\ 

Since Si is free of the parameters ..., and and hence observable, 
we therefore have the following result: 

10.3.3 Under the conditions q/* 10.3.1, Sil{n — k) is an unbiased estimator 
for 

Note that the mean values of 5, Si, and 5*2 are respectively, na'^, {n — k)d^, 
kd^. The numbers n, n ■— k, k are referred to as degrees of freedom of 5, 
Si, and ^ 2 * 

If Si in (10.3.22) is squared and summed over f, we find 

(10.3.27) = 

i*o 

“ ^00 2 

ij 

that is, Si can be written in the relatively simple form 


(10.3.27U) 
where ^oo = 2 .vl 


(10.3,28) |a,,,J 



^00 ^01 ’ ’ * ^Ok 

^10 ^11 ■ ’ ' 


1 ^kl ' ’ ■ ^kk I* 

The equivalence of (10.3.27) and (10.3.27a) is evident if one performs a 
bordered expansion of the determinant given in (10.3.28) by the first row 
and first column [for example, see Bocher (1907)]. 

If some of the regression coefficients, say • • •»h^ve 

known values, we can replace by y'^ in the preceding paragraphs, 
where y'^ = (y^ - Pk,+i^k,+u -analysis 



288 


MATHEMATICAL STATISTICS 


through with k replaced by fcj, thus obtaining trivially modified forms of 

10.3.1, 10.3.2, and 10.3.3. 

(c) Distributions of Regression Estimators in Normal Regression Theory . 

If we make the further assumption that the random variables 

f = 1,...»w, are independent with distributions +-h 

we obtain some results of considerable importance in applied statistics. 
Let be defined by (10.3.20). Then the p.e. of the random variables 
2 ^, I = 1,..., n, is 

(10.3.29) 

dF( 2 ;i,. .., z„) = ( 1 / 72 ^ 0 )" exp ^ 2 *|) dzi • • • dz„. 
Referring to (10.3.22) we find that Si and ^2 can be written as 
Si = 2 2?- 2 

(10.3.30) ^ ' 

Sa = 2 2 

iJ 

which are quadratic forms in the 2 ^, f = 1,..., n having matrices of 
ranks n — k and k respectively. Since S is a’ quadratic form having 
matrix of rank w in the 2 ^, f = 1,..., n, it follows by Cochran’s theorem 

8.4.4 that S'i/or* and are independently distributed according to 
chi-square distributions C(n — k) and C(k) respectively. Furthermore, 
we know from (10.3.10) and (10.3.11) that the are linear functions of the 
Vf, namely, 

(10.3.31) = 2 2 i = l,...,k 

I } 

having means /?„ and covariance matrix given by (10.3.13). Hence, by 
7.4.4, if is nonsingular, the have the /c-dimensional normal 
distribution N({Pi], |la*^<j^||). As a matter of fact. Si and {bi,... ,bj^ are 
two independent sets of random variables under our assumption of 
normality. This can be established by evaluating the characteristc 
function +of (5i, bi,..,, bj,) which turns out to be 

(10.3.32) (1 - • exp( «" Utf + f 2 Piui 

Therefore, we have the following important theorem in normal regression 
theory: 

10.3.4 If the random variables y|, f = 1,..., w, are independent with 

distributions N{piXi^ + • * * + f = 1,...» n, where the 
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matrix ||a,J, i,j = 1,..., defined by (10.3.8) is nonsingulaf^ 
then: 

(i) The defined by (10.3.10) are unbiased linear estimators 

for the regression coefficients /i,, / = and have 

the k-dimensionai distribution ^/({/^/}, Hflf'V^H); 

(ii) and (/?,, . . ., b,^) are independent sets of random 
variables; 

(iii) and SJo- are independent and have chi-square 
distributions C(n — k) and C{k), respectively. 


10.4 INTERVAL AND ELLIPSOIDAL ESTIMATORS FOR THE 
PARAMETERS IN NORMAL REGRESSION THEORY 


Since Si and (bi, . . . , hj^) are two independently distributed sets of 
random variables under the conditions of 10.3.4, it is evident that Si and 
any given b^ are independent. But b^ has the distribution V(/9,., a^^a^) and 
sja^ has the chi-square distribution C{n — k). Therefore, from 7.8.3 
it follows that for any bi. 


(10.4.1) 




has the “Student” distribution S{n — k), and it follows from an argument 
similar to that by which the confidence interval (10.2.11) was established 
that 

(10.4.2) f,, 

^ in - k) 


is a 100/% confidence interval for where t,^_f, y satisfies (10.2.10) with 
« — 1 replaced by n — k. 

If we set up the confidence intervals (10.4.2) for / = 1,. . ., A: it is 
evident that the number of confidence intervals covering their respective 
^’s has mean value yk. But this says little about the probability of the 
confidence intervals simultaneously covering their respective /^’s (unless the 
A’s are independent, which will occur only if is a diagonal matrix). 
The question then arises whether one can establish some kind of a simple 
random region in the A:-dimensional /?-space such that the probability 
is y that /?., covers the parameter point (/^i,..., Tn this instance a 
region can be readily found from the fact that Sija^ and S^jo^ are inde¬ 
pendently distributed according to chi-square distributions C{n — k) and 
C{k), respectively. For it follows from 7.8.5 that 

{n - k)52 
kSi 


(10.4.3) 



290 


MATHEMATICAL STATISTICS 


has the Snedecor distribution S(k, n — k). Substituting for from 
(10.3.22) we have 

(10.4.4) «««>< - 

J *^k,n-k,y 

dn,„_*(.F) = Y 

0 

where dFjg n-k(^ is the p.e. of the Snedecor distribution S(k, n — k), whose 
general form was given by (7.8.9). But (10.4.4) can be stated as follows: 

(10.4.5) P((/3i,...,/S,)ei?,) = y 

where Ry is the interior of a random ellipsoid in the jS-space centered at 

(6i,.,., ft|fc) and having equation 

(10.4.6) 2 - Wi - b,) = . 

i.i=i (n — k) 

This ellipsoid is an example of a lOOy % confidence region for the parameter 

point (/8i,..., /3;fc). We may refer to the ellipsoid (10.4.6) as a region 

estimator for (/^i,..,, ^j^). In a similar manner we can set up a region 

estimator for any subset of the This is left as an exercise for the reader. 

For a single say the confidence interval estimator (10.4.2) is, of course, 

an interval (one-dimensional region) estimator. 

The notion of a confidence ellipsoid was introduced by Hotelling (1931) 

in connection with his generalized “Student” distribution which is 

discussed in Chapter 18. 

10.5 SIMULTANEOUS CONFIDENCE INTERVALS: 
MULTIPLE COMPARISONS 

In the preceding discussion we have seen how the vector of sample 
regression coefficients (bi ,..., Z^^) can be used for constructing a confi¬ 
dence ellipsoid for the vector of population regression coefficients (/?i,..., 
jS*). In some problems, particularly in analysis of variance problems, 
such as those to be discussed in subsequent sections it is desirable to 
establish confidence intervals which hold simultaneously for a large 
number of linear combinations (with known coefficients) of the com¬ 
ponents of /c-dimensional normal random variables having a known or 
observable covariance matrix. This problem has been considered by 
Duncan (1952), Dwass (1959), Roy and Bose (1953), Roy (1954), SchefK 
(1953), Tukey (1953), and others. 

(a) A Probability Inequality for Simultaneous Confidence Intervals 

First we shall consider an inequality for the probability that parameters 
are simultaneously contained in h respective confidence intervals. Suppose 
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^ 1 ,..., are unknown parameters and let (^i, /2i),..., (ft^, be 
random intervals, each having confidence coefficient 1 — (1 —~y)lh. Let 
Ei be the event that (ji^, contains fi^ and its complement, / = 1,..., A. 
Then we have 


P(r,.)=i(l -y) /=!,..., A. 

The probability that all events , Ef^ occur simultaneously is 

P{Eir\ • • • But 

(10.5.1) P(E^r\ • • • nE^) = 1 - P(£,U • • • kjEj) 
and 

(10.5.2) P{E^KJ • • • u£,) < P{E,) + • • • + />(£,). 

Hence 

(10.5.3) PiE^n • • • n£,) > 1 - [P{E^) + • • • + P{E^)] 
that is, 

(10.5.4) P{E^r\ • • • n£J > y. 

We may summarize in the following result due to Tukey (1953): 

10.5.1 Suppose . . . , are unknown parameters and . . . , 

iPf^, Ph)are 100[1 - “ y)]7o confidence intervals for p^, • • • 

respectively. Then the probability is at least y that these confidence 
intervals simultaneously contain p^,,,,, pj^ respectively. 

We now consider the problem of the simultaneous fulfillment of iarge 
numbers of confidence intervals under some special conditions particularly 
applicable to the Model I analysis of variance problems to be discussed 
in subsequent sections. 

(b) Scheffe’s Method 

A basic result due to Scheffe (1953) can be stated in its essentials as 
follows: 

10.5.2 Suppose (wj, ...,«*.) is a k-dimensional random variable having 
distribution 

N({pil\\(y%j\\)J,J= .,,k, 

where lla. J is nonsingular, symmetric andknoyvn, and is unknown. 
Let vja^ be a random variable independent of (w^,. . ., m^) and 
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having the chhsqmre distribution with m degrees of freedom. Let 
^k,m,v point of the Snedecor distribution S{ky m) 

and 6 == V If^ is the set of all real vectors (ci,..., c*), 

where , c* are not all zero, the probability is y that the 

inequalities 

(10.5.5) 2 <^i^i - N'L < 2 < 2 + N1. 

i i,3 i i i.i 

hold simultaneously for all (c^,.. ., in 

Note that in the trivial case where . . ., are all zero (which is not 
included in (10.5.5) holds with probability 1. 

To prove 10.5.2 we note that /^j) 

^ ij 

independent random variables having chi-square distributions with k and 
m degrees of freedom respectively. Hence 

^ 1 

kv ij 

has the Snedecor distribution S(k, m). Therefore if ^ic,m,v ** ^he lOOy % 
point of this distribution we have 

(10.5.6) ^^2 “ /“i) < = y 

where 

c 2 _ kv ^ 
m 

It will be useful from now on to make use of fc-dimensional geometric 
concepts and terminology. 

The set of points in the space of (pi,.. . , Pf^) which 

(10.5.7) 2 a’ Vi - - M,) < (5* 

iJ 

is the interior of a 100y% confidence ellipsoid for the true parameter 
point {pi,. . ., pfg) which is centered at (mj, . . . , m^). If we consider the 
set of points in the space of {pi,.. ^, pjc) contained between all possible 
pairs of parallel (k — l)-dimensional hyperplanes tangent to this ellipsoid, 
this set of points constitutes the interior of the ellipsoid (10.5.6) and the 
probability associated with this set is, of course, y. It now remains to 
show that for any particular choice of (cj,..., c^) in ^ the two parallel 
(k — l)-dimensional hyper planes in the space of (pi,. . . , p^) having 
equations 

(10.5.8) 


CiUf ± dv'2 

i i Ui 
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are tangent to the ellipsoid 


(10.5.9) 


1 - u,) = a®. 


It is evident that any point , fi^) between the two hyperplanes 

(10.5.8) satisfies (10.5.5). 

For the moment let — m, = 2 /,. Then (10.5.9) can be written as 


(10.5.10) 2 = <5® 

t,J 


and the equation of an arbitrary hyperplane in the space of (y^,.. ., s/^) as 

(10.5.11) lc,y, = d. 

i 

We must find the two values of d for which the hyperplane (10.5.11) is 
tangent to the ellipsoid (10.5.10). Using a Lagrange multiplier A, we must 
find the stationary points in the ..., y;^.)-space of 

(10.5.12) O = - 2 a*%y\ + 2 c,y^. 

\ tj / i 

Differentiating with respect to we find 

-X 2 + C, = 0 

i 

or 

(10.5.13) yi = \'LauCj- 

A 3 

Substituting in (10.5.10) we find 

(10.5.14) A=±W2w,-. 

OM i,) 

From (10.5.14), (10.5.13), and (10.5.11), we find 
d = ±dy/2 auCiPi. 

ifj 

Putting this value of d in (10.5.11) and using the fact that — Wf 

we obtain (10.5.8) as the equations of the two tangent hyperplanes for 
specified (q,. .., Cj^). 

Finally, note that if we take only a finite number N of the vectors in 
the set of points in the (^i,..., /^;fc)-space which lie between all N 
pairs of hyperplanes corresponding to these N vectors is a random 
fc-dimensional polyhedron G* which circumscribes the ellipsoid (10.5.9). 
Since contains the ellipsoid, the probability contained in G*. exceeds 
that in the ellipsoid, namely, y. Hence, the probability exceeds y that any 
finite number N of the inequalities corresponding to N vectors in ^ are 
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simultaneously fulfilled. For example, if we take c, = 1, — 1 and 

all other c’s equal to 0, / >y = 1,..., A:, we obtain a subset of JV = 
k{k — l)/2 vectors in Then the probability exceeds y that all of the 
following k{k — l)/2 inequalities hold simultaneously 

(10.5.5a) {Ui - Uj) - + a, 2a,, < < (»< - a,) 

+ ^\l^n + flji ~ 2a, 

i>j— 1,..., fc. 


(c) Tukey’s Method 

Results similar to those in (10.5.5fl) for the case of all possible differences 
yu, — > y = 1,. . ., A:, but using a “ Studentized range ” estimator for 

CT, have been obtained by Tukey (1953). Suppose , 2 ;t) is a sample 

of size k from where is a known positive constant and let 

vja^ be a random variable independent of ( 2 ^,..., 2 ^^.) and having the 
chi-square distribution with m degrees of freedom. Let R be the range of 
(2i»...» 2 ;^), that is, R = max ( 21 ,..., 2;^) ” (^i> • • • > ^k)- Xhe random 

variable VmRjVvd is called the Studentized range Rj^ ^. For a given 
confidence coefficient y let Ihe upper I00y% point of the 

distribution of Rj^^ defined as follows: 

(10.5.15) < R,,,,.,) = y. 

Tukey’s (1953) result can be stated as follows: 

10.5.3 Suppose (w^,. . , , w^) is a k-dimensional random variable having 
distribution iV({/^i}» ^here a^ = a^, a^j = a^i = pa^, 

i ^ j ^ 1,. . ., k, and p being known. 

Let vja^ be a random variable independent of (w^,. . ., w^t) and 
having the chi-square distribution with m degrees of freedom. If 
is the set of real vectors (Ci,..., c^) for which = 0, and 
Cl,. . ., Cjfc are not all zero, then the probability is y that the 
inequalities 

(10.5.16) - £> < ^ 

hold simultaneously for all vectors in ^q, where 

(10.5.17) D = vWl - p)lm (i kj)- 

Note that for the trivial case in which q,..., c* are all zero (which 
is not included in ^q), (10.5.16) holds with probability 1. 

To prove 10.5.3 let u, ^ p.^szy.j = 1,..., A:. Then(^i, ...,y*:)has 
the distribution ^({0}, where the are as defined in 10.5.3. Let 
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w' = 2/i + • * • + 2/fc and 2 ^ = + hw^ where A is a constant satisfying the 

quadratic equation (where ~1/(A: - 1) < p < 1 since is positive 
definite) 

(10.5.18) h^k[\ + (A: - l)p] + 2A[1 + (A: - l)p] + p = 0. 

Then are independent, each having the distribution 

A^(0, a^a\\ — p)). Now consider the inequalities 

(10.5.19) < //, /,y= 1,...,A:. 

These inequalities are all satisfied if and only if the range R of ( 2 ^,. .., 2 ;*) 
satisfies 

(10.5.20) R< H, 

Recalling that vja^ has a chi-square distribution with m degrees of freedom 
and is independent of (z^, . .., 2 ^^) and using the fact that ( 2 ^, . . ., 2 ^^) is a 
sample of size k from A^(0, 1 — p)) it is seen that the random variable 

rV ml[aH(\ •— p)] is the Studentized range Rj^ ^. Now suppose (cj, . .., Cjt) 
is a vector such that q,. . ., are not all zero, but c, = 0. If we let 
denote summation over all values of / for which c, > 0, and similarly 
let denote summation over all j for which (-r,) > 0, then we may 
write 

(10.5.21) 2 : c, = r; (-C,) = 1 2. k.l = 

Furthermore 

1i = Ti <^i^i - 2 / 

= [2; Ti - 2; 2;' cx-c,)2,] 

If \Zi — Zf\ < H, i,j = I,. .., A:, we have 

\lic4 = ^\l:iTscl-c,)(z,-z;)\ 

< ^ 2.' Yi C<(-Ci) |2< - 2,1 < H K. 

iv 

That is, 

(10.5.22) \liCiZi\<H-{i2i\Ci\) 

which holds for all real vectors (c^,..., c*.) in But (10.5.22) holds 
for all vectors (q,..., r^) in if and only if (10.5.19) holds', which holds 
if and only if (10.5.20) holds. Hence 

(10.5.23) P(|2, c,z,l < HQ 2. Ic,l) for a» (^i..... c*) in «’o) = P(R < «)• 
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If „ denotes the Studentized range defined in (10.5.15), we have 


(10.5.24) 

Thus if we choose 

(10.5.25) 






flVl - p) 




Kl — p) 


m 


where y is defined in (10.5.15), we have 

(10.5.26) p{r< j?,= y 
and therefore 

(10.5.27) P(|2<ca1 < D for all (Cj,..., c*) in 'g’o) = y 


where D is given by (10.5.17). 

But 

li c <( m < - p,) 

and hence 

(10.5.28) P(|2< CiUi — 2. c<^,| < D for all (cj,..., c^) in <^ 0 ) = V- 


This is equivalent to stating that the probability is y that the inequalities 
(10.5.16) hold simultaneously for all (q,..., in ^o» Ihus concluding 
the argument for 10.5.3. 

It should be noted that the probability exceeds y that for any finite set 
of vectors in the corresponding inequalities (10.5.16) hold simul¬ 
taneously. For instance, the probability exceeds y that the inequalities 


(w 


i-Uf)- ^ < (pi -pt)<{Ui- Uj) 

^ m 


R h- 


- p) 


m 


for all choices of /,y, for / >y = 1,.. ., A:, hold simultaneously. This, of 
course, is equivalent to stating that 




are confidence intervals for (^, — respectively, for all / > 7 = 1 ,..., k, 
such that the probability exceeds y that all differences (fii — /w^), i> j ^ 
1, ...,Ar, are simultaneously contained in their respective confidence 
intervals. 
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Dwass (1959) has formulated a generalization of the problem of 
simultaneous intervals which includes the results of Scheffe and Tukey as 
special cases. It should be pointed out that the basic idea of simultaneous 
confidence intervals in the special case of a confidence region for a 
regression line is due to Working and Hotelling (1929). 


10.6 NORMAL LINEAR REGRESSION ANALYSIS IN 
EXPERIMENTAL DESIGNS 

In this section we apply the regression theory of Section 10.3 to a fairly 
simple stochastic description of experimental designs, sometimes known 
as the Model I description. We shall only consider three of the simpler 
designs: the two-factor, the three-factor, and the Latin Square designs. 
The reader interested in fuller treatments of these and other designs and 
their associated statistical analyses should consult books by Fisher (1935a), 
Cochran and Cox (1957), Graybill (1961), Kempthorne (1946), Mann 
(1949), and Scheffe (1959). Tables of experimental designs have been 
prepared by Fisher and Yates (1938) and Kitagawa and Mitome (1953). 
A guide to the literature of experimental designs has been incorporated in 
a book by Greenwood and Hartley (1961). 

The theory of linear regression analysis has also been applied to what 
is now called response surface analysis. The principal contributors to this 
type of analysis are Box (1952, 1954), Box and Hunter (1957), and Box 
and Wilson (1951). 

(a) The Complete Two-Factor Experimental Design 

In this type of experiment it is assumed that we have a set of rs inde¬ 
pendent random variables | = 1, . . ., r, = 1, . . ., j}, all having 
equal variances, say d^, and having mean values 

(10.6.1) <^(*5,) = ^ ++/i., 

where 

( 10 . 6 . 2 ) 1 ^ = 1 ^ , = 0 - 

1 = 1 fl = l 

Note from (10.6.1) that for every f and rj, under conditions (10.6.2) 
is a linear function of r + s — 1 regression coefficients p, p^., p.,^ of form 
(10.3.1) where the fixed variables all have the value 0, 1 or —1. 

In this setup we may think of a rectangular array consisting of r rows 
and s columns, the r- rows being associated with r specified levels or 
categories R^,..., R,, of factor R, and the s columns being associated 
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with s given levels or categories Q,..., C, of factor C. Then is a 
random variable which describes the response or yield associated with the 
combination C^) of the R and C factors; the set {x^^} will be called 
response random variables. 

Remark, To illustrate these ideas more concretely, suppose we have r 
operators Ri,,.,, Rr and s machines Ci,..., C^, 5 > r, producing piece parts 
of a certain kind. If we allow each operator to operate each machine for an eight- 
hour day and let be the output of operator R^ from machine we have a 
two-factor experiment which could be fun in s eight-hour days. 

The regression coefficient p in (10.6.1) is called the over-all average 
response levels the differential effect due to R^ and p.^ the differential 
effect due to C,,. We wish to determine minimum variance linear estimators 
for p, p^. and p.^, variances of these estimators, and also an estimator for a*. 

From the set of random variables we define the means x.., x^. and 
x.^ exactly as in (8.6.7). Furthermore, let 

m = X.. 

m^. — x^. - X.., f = 1,.. . , r 
^ n = 1,..., s 

5 = 2 - /«•/ 

S.. = 2 - m - m^.- m.^f 

S-oifii ) - 2 ('”{• - 

So i/i-n) = 2 - /<•»)* 

^oo(i“) = 2 ("> - /“)*• 

It can be verified by elementary algebra that 
(10.6.5) S = S . 4- + So.(m.,) + Soo(/i). 

It should be observed that S.. is identically the same as S.. in (8.6.9), 
whereas 5.o(0) = 5.,, and 50.(0) = Sq. in (8.6.9). Also note that S.. + 
5.o(0) + 5o.(0) = 5y where Sj, is given in (8.6.9). 

We know from 10.3.1 that the minimum variance linear estimators for 
H, fi(, and are those values which minimize S in (10.6.4). This 
minimization process as will be seen by examining (10.6.4) and applying 
10.3.1 and 10.3.2 produces m, /«{. and m., as the minimum variance linear 
estimators for /*, and respectively. Furthermore, it follows from 
10.3.3 that S../(r — l)(j — I) is an unbiased estimator for o*. 


(10.6.3) 

and 


(10.6.4) 
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We leave it to the reader to verify that the covariance matrix of the 
estimators m, nif. and m., has the following elements; 

^2 

or®(m) = —, aim, m,.) = aim, m..) = 0 
rs 

(10.6.6) Aw,) 

aim^., m^..) - aim.„, m.„.) = - — , 

rs 

m.,) = 0. 

Unbiased estimators of these matrix elements, in turn, are obtained by 
replacing in each instance by SJir — 1)(5 — 1). It should be 
particularly noted that the covariance between m’s from any two of the 
three sets w, is zero. 

In conclusion we have the following result; 

10.6.1 Let f = 1,.. ., r, rj = I,, ,,, s} be a set of independent 
random variables all having the same variance and means 

where ~ Minimum vari- 

^ n 

ance linear estimators for //, p.^ are m, m^.y m.^j respectively, 

whereas S,.l(r — 1)(5 — \) is an unbiased estimator for where 
S.. is defined in (10.6.4). The covariance between m’s from any two 
of the three sets of random variables m, {a??;.}, {m.^} is zero. The 
covariance matrix among all m^s has elements given by (10.6.6). 

If the additional assumption is made that the set of random variables 
f = ly . . ,, r, 7) = 1,. .., 5 } are independently distributed according 
to the normal distributions N(p + p^. + p.^, a^), the following result can 
be established by argument similar to that used in arriving at 10.3.4: 

10.6.2 Let {x^^; f = 1,. . ., r, rj = \y. . , yS} be a set of independent 

random variables having distributions N(p + p^. + d^) where 

2 1^^ == 2 Then m, {wt.} [m.,^ are three independent 

normally distributed sets of random variables with means p, 

[p.^] respectively y and with covariance matrices given by {\0.6.6)ythe 
two latter distributions being degenerate and subject to the restric¬ 
tions ^m^. = 0, n ^ respectively. FurthermorCy S.ja^, 

S.Q(fi(.)l(T^, SQ.(fi.,i)l(T^, are independent random variables 

having chi-square distributions C((r — t)(s — I)), C(r — 1), 
C(s — 1), C(I) respectively. 


rs 

f ^ f', 
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From this result, and by using the Student distribution, we can write down 
confidence intervals for any of the constants ,.., jx.i ,..., 

For instance, 100y% confidence intervals for and are 

( 10 . 6 . 7 ) m,. ± 

^■n±hr-i^is-iu- 

respectively. We can similarly set up confidence intervals for the differences 
such as f ^ or other linear functions of the ju's. Also by 

using 10.5.2 or 10.5.3 we can set up sets of confidence intervals for fx^. — 
f > f' = 1,. . ., r, (or for fx.^ — fx.^., rj > rj' = 1,. . ., 5 ) which hold 
simultaneously with probability at least y. 

Also, confidence regions similar to those defined by (10.4.6) can be set 
up for simultaneous estimation of two or more of the fx"s. A particularly 
important case arises when it is desired to estimate the (or the 
simultaneously. Suppose we wish to estimate the simultaneously. 
Since, as stated in 10.6.2, S.Q(fx^)la^ and SJa^ are independently dis¬ 
tributed according to chi-square distributions C(r — 1) and C((r — 1) 
(5 — 1)), it follows from 7.8.5 that (5 — 1)S.q{ili^.)IS. has the Snedecor 
distribution S((r — 1), (r — 1)(5' — 1)). Therefore, we have, using the 
notation of (10.4.6), 

(10.6.8) = y. 

But this can be stated as follows: 

(10.6.9) P((/ii.,...,//,)eR,) = y 

where Ry is the (r — l)-dimensional lOOy % confidence sphere for the point 
(//i.,..., fx^), having equation 

(10.6.10) 2 == 7*^r-l,(f-l)(s-l).y 

s — 1 

subject to 2 {pc. — m^) = 0. If we are particularly interested in the 

possibility that the are all zero, we see whether the sphere having 
equation (10.6.10) includes the point (0,..., 0). This reduces merely to 
checking whether the inequality 


( 10 . 6 . 11 ) 


(5 ^ l)S.o(0) 
S.. 


<- 


r-l.(r-l)(5-l).y 



Sec. 10.6 


LINEAR STATISTICAL ESTIMATION 


301 


holds. If it does hold, we say that the set of random variables {xf,} support 
the statistical hypothesis that the are all zero at the 100y% confidence 
level. If (10.6.11) holds, an alternative statement is to say that the m^. 
(the estimators for the //|.) are not significantly dififerent from zero at the 
100(1 — y)% level of significance. The important point is that the random 
variable used for making the test, namely {s — l)S.o(0)/5.. is observable. 

In a similar manner we test whether the set of random variables 
support the statistical hypothesis that the p.,, are all zero. 

It is customary to set up the constituents of the Model I description of 
the complete two-factor design under the assumption of normality into an 
analysis of variance table as shown in Table 10.1, remembering that 
>^■0(0) ~ ‘^•0 6 ' q .( 0 ) = Sf^.. 

Table 10.1 Model I Analysis of Variance Table for 
Complete Two-Factor Experimental Design 


Source of 
Variation 

Degrees of 
Freedom 
(D.F.) 

Sum of 
Squares 
(S.S.) 

Mean Sum 
of Squares 
(M.S.S.) 

Snedecor i^-Ratio 




‘S'o 


Rows 

r - 1 

5.0 

{r - 1) 

is - l)S.olS.. = .^.0 

Columns 

j — 1 

5o. 

5o. 

is - 1) 

(r - l)So./5.. = ^0. 


S.. 

Residuals (error) (r — 1)(5 - 1) S.. ■ 


Total 



The first Snedecor ratio is to test the hypothesis that the [jl^. are 
all zero, and the second is to test the hypothesis that the are all zero. 

The arrangement of analysis of variance constituents into table form, 
such as Table 10.1, is due to Fisher (1925^), who also developed the theory 
and application of Model 1 analysis of variance procedures. 

(b) The Complete Three-Factor Experimental Design 

The ideas of Section 10.6(^7) can be extended in a straightforward fashion 
to higher order designs, that is, experimental layouts involving three or 
more factors. It is perhaps worthwhile to show how the extension goes 
for three factors. Here we have a set of rst independent random variables 
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f ** 1. • • •. t; = 1,..., i= 1,..., i} whose variances are all 
equal to and whose mean values are given by 

(10.6.12) <?(*!,{) — fi + /i(.. + + /if,. + 

where 

= 2i“-{ = 0* 

C n C 

2 ^1,- = 2 = 0. 2 = 2 = 0 

i ft I c 

= 2 /^ 1 ?; = 0 * 
n C 

In this type of experimental layout we may think of r rows, s columns, 
and t layers, the rows being associated with r given levels .. ., of a 
factor R, columns being associated with 5 : given levels Cl..... c. of factor 
C, and layers associated with / given levels Lj,..., of factor L. The 
constant yu^.. is the differential effect due to with similar interpretations 
for and //..{, whereas is the differential effect associated with 
(jRf, C^) or the interaction between R^ and C^, with similar interpretations 
for /if.j and 

To estimate the various /i’s we define x..., x^.., x.^., x..^, x^^., x^.^, and 
x.^j by obvious extension of (8.6.7). Also, we define 

m = X... 

(10.6.13) m|..= x^.. — X..., similarly, for m.,,. and m..^, 

— x^.. — x.^. + X..., similarly for and 

Furthermore, let 

5 = 2 (*|,C - j“ - - /“••{ - { 

l.q.C 

S... = 2 (*f,; - m - m|.. - m.,. - m..^ - 

l.if.C 

(10.6.14) 

S..o(jM{r) = 2 ("»{»• - similarly for and So..0«.,{). 

S-ooiMi -) = 2("'f - similarly for and Soo if^..^), 

ixf. 

Note that S... is exactly as defined in (8.6.31), whereas 5..o(0) = S..q, 
‘S'.oo(O) == *S.oo» clc., where S..q, S.^, etc., are defined in (8.6.31). Then we 
have the following decomposition for S: 

(10.6.15) S =s S... + S..q(//|,,.) + S.0.(/^|.,,) + 

+ S.oo(/^f.) + So.o(/^.i|.) + Soo.(/i..5) + Sooo(^)* 
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If we make use of 10.3.1 it is evident that the minimum variance linear 
estimators for fi, are m, m^.., m.,., 

"•{ £• "‘it respectively. Furthermore, if we use 10.3.3 we find that 
S.. 7[(r — l)(s — 1)0 — 1)] is an unbiased estimator for a* 

It is sufficient to list only the following nonzero elements in the covariance 
matrix of the m’s since the remaining nonzero elements can be written 
down by considerations of symmetry: 

<r‘(m) = — , ~ 

rst rst 


(10.6.16) 


aims; m|...) = - 

aim^^., m^,..) = 


rst 

(r - l)(s - 1 ) 0 ^ 
rst 

-(r_-_lV 

rst 




mj.,..) --, 

rst 


f 7^ f'. »? 7^ »?'• 


The covariance between m's from any two of the sets m, {w.,.}, 

is zero. 

Substituting S.../[(r — l)(s — 1)(/ — 1)] for in (10.6.16)] provides 
unbiased estimators for the variances and covariances defined in (10.6.16). 

Formulation of statements similar to 10.6.1 and 10.6.2 for the three- 
factor experiment are left to the reader. 

In extending 10.6.2 to the three-factor experimental design we point out 
that if the random variables in the set are independent and have 
normal distributions 


(10.6.17) 
then 

(10.6.18) 


N(/l + -I- yU.,. + /i..; + /^Sn- + /«{•{ + /*i{» 

S— S..o(/^|,.) S.o.(^|.j;) 5o..(/^.,{) 

a^' ’ a* ’ a* ’ 


^oo(/^-c) ^ooo(^) 

CT* ’ 

are independently distributed according to chi-square distributions with 
(10.6.19) (r - l)(s - 1)(/ - 1), (r - l)(j - 1), (r - 1)(/ - 1), 
is - l)it - 1), (r - 1), is - 1), it - 1), 1 
degrees of freedom respectively. Here we can set up confidence intervals 
for individual ^’s similar to those given in (10.6.7) for two-factor experi¬ 
mental designs, and confidence regions similar to those in (10.6.10) for 
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the simultaneous estimation of several /i’s. By using 10.5.2 or 10.5.3, we 
can also set up simultaneous confidence intervals for sets of differences 
such as - /if...; | > f = 1,..., r}, {/tf,. - f > f' = I,..., r 
and rj iixed}, and so on. 

We leave it to the reader to set up a Model I analysis of variance table 
similar to Table 10.1 for the complete three-factor design. 

(c) The Latin Square Experimental Design 

The complete three-factor experimental design described above involves 
random variables, which, in some applications may constitute a prohibitive 
amount of experimentation. Reduction of the amount of experimentation 
can be achieved by balanced incomplete three-factor designs. The simplest 
of these designs and the only one we shall consider is the Latin Square 
design, Which selects a balanced incomplete matrix sample of 

response random variables from a three-way sample matrix of r 

rows, r columns, and r layers. 

In the Latin Square design we consider three factors /?, C, and L and r 
levels of each of the three factors, ,.., Q,. .., Q; Li, . . ., 

We may associate with the fth row, with the lyth column and 
with the ^th layer of a three-dimensional rectangular array of cells. The 
sample of random variables for our Latin Square design is a 

balanced incomplete matrix sample having r^ components from the 
matrix sample associated with this r x r x r array of cells; 
is selected so that for each combination of values of (f, rj) there is exactly 
one random variable, for each combination of values of (f, Q exactly one 
random variable, and for each combination of values of (rj, f) exactly 
one random variable (see Section 8.6rf). Let the set of r^ values of (f, /y, f) 
thus selected be G*. 

Now it is assumed that the r^ elements in the sample are 

independent with normal distributions o^) where for any 

(f, rjy 0 G G*, we have 

(10.6.20) + l^i- + M' ri’ + c 

where 

(10.6.21) 2 /“l - = 2 = 0- 

I 1 { 

Comparing this with (10.6.12) it is to be particularly noted that we assume 
all second-order interactions to be zero. 

The Latin Square design provides relatively simple minimum variance 
linear estimators for the /i’s in (10.6.20). To see this let ir*, x.*, 

be defined as in (8.6.40). Also, let m*, m*., w.*., m*^ be defined in 
terms of the x's in in exactly the same way that m, m^.., m.^., m..^ 
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arc defined in terms of the a:’s of (*{,{} in (10.6.13). Furthermore, let 
'Sq,q(^.,j.), defined as in (10.6.14) with /n*s 

replaced by their corresponding m*'s, the summation being performed 
for all (f, 0 e G*, of course. Also, let 

S* = 2* (*f,C - /* - 

(10.6.22) ^ -m* - ml. - m* - m.*)® 

where, as in (8.6.40), 2* denotes summation over all (f, w, 0 g G*. 

^»7.t 

Now we have the following decomposition of S'*, 

(10.6.23) S* = S* + + S*.(//..,) + 

where the components on the right have (r — l)(r — 2), (r — 1), (r — 1), 
(r — 1), 1 degrees of freedom respectively. Applying 10.3.2, it is evident 
that AW*, AW*., AW.*., m\ are minimum variance linear estimators for )ti, 
> (“..c respectively. Furthermore, S.?./(r — l)(r — 2) is an unbiased 
estimator for a*. 

The covariance matrix for the estimators aw*, aw*., aw*., aw.T^ has the 
following nonzero elements: 

or^AW*) = , 

(10.6.24) A'”* ) = Aw*.) = Aw!{) = 

(y(Aw*., ml.) = or(An.*., aw*..) = a{m\, m*^) = “ 

for S 9^ i', V 

Unbiased estimators for these variances and covariances are obtained by 
replacing by S.*/(r — l)(r — 2). 

Formulation of summary statements similar to 10.6.1 and 10.6.2 for the 
Latin Square design is left to the reader. We also leave it to the reader to 
set up a Model I analysis of variance table similar to Table 10.1 for the 
Latin Square. 

10.7 ESTIMATION OF VARIANCE COMPONENTS FROM 
LINEAR COMBINATIONS OF RANDOM VARIABLES 

In many statistical problems, particularly in such fields as the design of 
experiments, error theory, and psychological factor analysis, there are 
situations where only linear combinations of certain random variables 
having zero covariances are observable, whereas the random variables 
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themselves are not observable. An example of such a linear combination 
of random variables is that given in (8.6.3), where x{u, v) is an observable 
random variable which is expressed as a linear function of [i and the 
unobservable random variables and 
In such situations the basic problem is to devise estimators for the 
variances of the component random variables from the observable linear 
combinations of these unobservable random variables. 

More precisely, let f = 1,...,« be observable random variables 
such that has the following form: 

m 

(10.7.1) = ^ + 2 f = 1,. .., n > m 

0-1 

where 

(10.7.2) <^(e,f) = 0; ^ g ^ 1,..., m; i = I .n 

= 0. g'^ f ^ f'. 

and where the are known constants and ||a^^|| is of rank w. It is not 
assumed that the €g^ are observable, that is, they may be functions of 
unknown parameters. 

Now it is evident that the mean of the sample, x, is an unbiased estimator 
for /jt. The variance of this estimator is 

(10.7.3) = 

Tf* 0 i 

Also, if is the sample variance, we have 

(10.7.4) As') = -2I<‘^- 

n 0 i 

Note that is an unbiased estimator for the quantity (IIn) ^^ag^Ogi 

0 f 

and hence that s^jn is an unbiased estimator for a\x). But we wish to 
estimate the individual variance components af,..., from the saihple 
(a?i,..., x^). To do this consider the total sum of squares 

(10.7.5) 

On the right we have a quadratic form in the random variables (aj^ — «),...» 
{x^ — x) which can be expressed as one in (x^ — ju),..., (x^ — /x), having 
a matrix of rank n — 1. If n — 1 > w, it can be shown by the elementary 
theory of positive semidefinite quadratic forms that Sj> can be decomposed 
(in many ways) as follows: 

(10.7.6) 5^1 =5 + • • • + 
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where Sj,..., S„ are positive semidefinite quadratic forms in — x), 

..., (*n — *) which can be expressed as quadratic forms in (arj — , 

(*M "■ /*) having matrices of ranks Wj,..., (all > 1) respectively, where 
«! +-1- = n — 1. Then we may write for A = 1,..., m 

(10.7.7) = 

^,*1 

where is a symmetric matrix of rank rtf^ whose elements are known 
numbers. Substituting (x^ — //) and (x^ — fx) from (10.7.1) into (10.7.7) and 
taking mean values we find 

(10.7.8) A5.) = lB,,or2 

g 

where 

(10.7.9) = 

Now it follows from the properties of together with our assumptions 
about lla^fll that is nonsingular. Therefore, applying 10.1.1, we 

have from (10.7.8) the following unbiased estimators for af,... ,al^: 

(10.7.10) = 

h 

An unbiased estimator for a\x) is obtained by replacing in (10.7.3) by 

= 1,. .. ,/w. 

To summarize: 

10.7.1 Suppose x^, I = 1, ...,«> w are observable random variables 
such that 

m 

*1 = // + 2 

1 / = 1 

the €gt being random variables whose means and covariances are all 
zero and such that <^(£^{) = where /i and the are unknown 
parameters and the a^^ are known numbers such that is of 
rank m. Let x be the mean of (* 1 ,... ,x„) and let S^, 5^,..., S„ 
be quadratic forms in (x^ — S), ...,(»„ — x) as defined in (10.7.5) 
and (10.7.6). Then x is an unbiased estimator for p whereas 

A 

are unbiased estimators for a^,g= 1,..., w, where 

{ 

II .(4^*'II being the matrix of the quadratic form S^. 

In many special problems in which variance components are to be 
estimated, the matrix of known numbers ||a,{|| takes a special form which 
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immediately suggests the decomposition (10.7.Q for Sj,. In the next 
section we shall consider several applications of the preceding principles 
to experimental designs. 

10.8 ESTIMATORS FOR VARIANCE COMPONENTS IN 
EXPERIMENTAL DESIGNS 

In this section we shall apply the ideas of estimation of variance com¬ 
ponents to some of the simpler experimental designs. Actually, we shall 
consider only the complete two-factor, the balanced incomplete two- 
factor, the complete three-factor, and the Latin Square designs from the 
point of view of variance component analysis. The reader interested in 
this subject in greater detail than given here is referred to books by 
Cochran and Cox (1957), Graybill (1961), Kempthome (1952), and Scheffd 
(1959). We shall consider only the problem of determining unbiased 
estimators of the variance components involved in these experimental 
designs and not the variances of these estimators. The reader interested 
in this latter problem is referred to papers by Hooke (19566), Scheff6 
(1959), and Tukey (19566, 1957a, 19576). Results on confidence intervals 
of these variance component estimators have been obtained by Bulmer 
(1957), Moriguti (1954), SchelK (1959), and others. 

The stochastic description of experimental designs by using observable 
linear functions of random variables is sometimes called the Model II 
description as contrasted with the Model I description based on normal 
regression analysis and discussed in Section 10.6. The Model 1 and Model 
II designations for the two analyses of variance schemes were introduced 
by Eisenhart (1947). 

(a) The Complete Two-Factor Experimental Design 

The basic sampling theory required for the Model II stochastic descrip¬ 
tion of a complete two-factor experimental design is provided by that of 
second-order matrix samples as presented in Section 8.6(a). We shall 
therefore adopt the notation of that section. 

First, however, some comments are in order on the basic difference 
between the Model I and the Model II approaches to the complete two- 
factor experimental design. In the Model I description of the complete 
two-factor experimental design discussed in Section 10.4(a) it was assumed 
that jR-factor levels Rj,..., R, and C-factor levels Ci,... ,C, were fixed 
for the totality of all experiments described by the response random 
variables In some situations, however, the r R-factor levels or the s 
C-factor levels or both are samples from larger populations of levels which 
may be finite or infinite. In this instance the regression analysis (Model I) 
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approach is not appropriate. We shall consider the case where the popula¬ 
tions of levels are infinite. More precisely, in the Model II approach we 
assume that for any /1-factor level u and any C-factor level v, the response 
random variable x(u, v) is of the form given by (8.6.3). Its variance o* 
will be given by (8.6.4) or, in shorter notation, by (8.6.4a), that is, by 

( 10 . 8 . 1 ) 

where o?o is the row (/1-factor) component, <^. the column (C-factor) 
component, and a?, is the residual component. 

If r /1-factor levels are drawn at random, and s C-factor levels are drawn 
at random—^both from infinite populations—our response random 
variables will be the matrix sample f = 1,..., r; = 1,..., s} as 
defined in Section 8.6(a), the being independent and expressible by 
(8.6.8), that is, 

(10.8.2) = p,-\- Cf. -f- e.„ -I- Cf, 

where p is the population mean, and where the Cf., e.,, and are random 
variables having zero means and zero covariances. 

The variance a* of satisfies (10.8.1). The main problem of the Model 
•II analysis is to estimate <7?o, Oq., and a?, from the sample 

For this purpose, we adopt the definitions of 55.., Xf., as given in 
(8.6.7), and of S.g, Sq., S.. given in (8.6.9). Under the assumptions of 
8.6.2 we have, as given in (8.6.11), 


(10.8.3) 


«^(S.o) = (r - l)(s<T?„ -t- a*.) 
ASo.) = (s - l)(r<TS. -f a?.) 
^(S..) = (r - l)(s - 1)0?. 


Applying 10.1.1 (or 10.7.1) we therefore obtain the following unbiased 
estimators for cr%, oj., and of?: 


(10.8.4) 




= 


S.. 


{r-m-iy 


The estimator for p, the population mean, is x.., whose variance is 
given by <^Jr -|- a^Js -1- a.?/rj as stated in (8.6.12). An unbiased estimator 
for this variance is therefore provided by replacing a?o, Og., and or? with 
their respective estimators given in (10.8.4). 
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It is customary also to display the constituents in the Model II descrip¬ 
tion in an analysis of variance table as shown in Table 10.2. 


Table 10.2 Model II Analysis of Variance Table for 
Complete Two-Factor Experimental Design 


Source of 
Variation 

Degrees of 
Freedom 
(D.F.) 

Sum of 
Squares 
(S.S.) 

Mean sum 
of Squares 
(M.S.S.) 

Mean value of 
M.S.S. 

Rows 

r - 1 

S.Q 

5.0 

(r - 1) 

jcr ?0 + of! 

Columns 

j — 1 

Sq. 

Sq. 

(5-1) 

raj. + a?. 

Residual (error) 

(r - IXi - 1) 

s.. 

1 

1 

a?. 

Total 

rs - 1 

Sjy 




Finally, the reader should note that for the case of finite populations of 
l?-factor levels and of C-factor levels, we find estimators for the finite 
population version of the variance components, that is, and aj.., 

by replacing equations (10.8.3) with (8.6.22) and applying 10.7.1. A table 
similar to Table 10,2 can, of course, also be drawn up for this case. 

(b) The Balanced Incomplete Two-Factor Experimental Design 

The sampling theory involved in the Model II treatment of the balanced 
incomplete two-factor design is given in Section 8.6(rf). This design selects 
the subsample of size n rs' = r's of response random variables 

from the matrix sample as defined in Section 8.6(d). Each element 
in is a random variable which is itself a linear combination of random 

variables as stated by (10.8.2). Furthermore, the variance of each ele¬ 
ment in has the value which, in turn, is the sum of three 

components as stated in (10.8.1). 

The main problem of a Model II analysis of a balanced incomplete 
two-factor design is to devise unbiased estimators for /x, and the variance 
components (t?o, <Tq., and a? by using the elements of the subsample 

K.}*- 

The mean x* of the sample {*.,}* is an unbiased estimator for fi which 
has variance 

(10.8.5) ^ 

n r s 

as we have seen in (8.6.39). 
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It will be observed from the last three equations in (8.6.39) that those 
equations have a unique solution for (r?o, and <t?. provided r' > 1 
and s' > 1. Under these conditions we apply 10.1.1 (or 10.7.1), obtaining 
the following estimators for <r?o, Oq., and o?. : 

^ Sl±S^ _ 
n — s 

^-1(^2) ^ Sj_+_S| _ 
n — r 

= I [AS* + BS* + CS^] 

JC 

B = (^> 

(n -r) (n- s) 

C=A+B-1 

iC = n — r — s + 1. 

An unbiased estimator for is obtained, by substituting the 

estimators for cr?o, Oq., and or?, respectively, 

in (10.8.5). 

We leave it to the reader as an exercise to set up a Model II analysis of 
variance table similar to Table 10.2 for the incomplete two-factor design. 

(c) The Complete Three-Factor Experimental Design 

The estimation of variance components in the complete three-factor 
design follows from the underlying sampling theory of third-order matrix 
samples given in Section 8.6(c). We shall use the notation of that section. 
The sample of response random variables associated with the complete 
three-factor design is the matrix sample J referred to in Section 8.6(c). 
We are assuming, of course, that the r /^-factor levels, the s C-factor levels, 
and the t L-factor levels are samples from infinite populations. Thus, 
each element of the sample {x^rjc} is the sum of random variables 
having zero means and zero covariances as stated in (8.6.30), that is, 

(10.8.7) = /* + «{.. + e.,. + e..j + + «{.£ + e.,{ + 

the variance <t* of being the sum of seven components as stated by 
(8.6.27) or (8.6.28), that is. 


( 10 . 8 . 6 ) 


where 


( 10 . 8 . 8 ) 


0'®= + (fo-o "b "b +<^0- + ^0" "b <t?... 
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Now, applying 10.1.1, (or 10.7.1) we obtain estimators for these com¬ 
ponents from 5 . 00 , ‘^<h» ‘^oo > '^•o > ‘^o -> defined in (8.6.31), 

by solving equations (8.6.34) for the variance components. This gives 


.y-V.) = 


5 ... 


(r - l)(s - IXt - 1 ) 


= Ti — T^r-r. - - 

r(s - 1)(I - 1) r 


d’" V§.) = 


_ _ 1 jp-i/' 


-id'-V..) 


s(r - 1 )(I - 1 ) 5 

( 10 . 8 . 9 ) 

^■'(O - - - ■f'W.) - - V.) - - .?■ V.) 

rs(t — 1) 5 r rs 

■'(‘T^..) 

rr(s — 1) t 


r^o^co) = 


5 ., 


00 


1 


r 

1 jp-v 


rt 

1 


, 00 . - --^ - 7 Vo ) - - - -7 V.). 

sl(r — 1) t s St 


An unbiased estimator for fi is x..., whose variance is given by formula 
(8.6.35). An unbiased estimator for a*(*...) is, of course, obtained by 
replacing the variance components in (8.6.35) by their estimators in 
(10.8.9). 

The reader will find it instructive to set up a Model II analysis of variance 
table similar to Table 10.2 for the complete three-factor design. 

We also remark that estimation of variance components in a complete 
three-factor design for the case of finite populations of /{-factor, C-factor, 
and L-factor levels is straightforward and will be left to the reader. 


(d) The Latin Square Experimental Design 

As we have pointed out in Section 8 . 6 (d), and Section 10.6(c), a Latin 
Square design is a special case of a balanced incomplete third-order matrix 
sample, namely, that for 5 = / = r and s' = t' = r' = 1. The sample 
of response random variables for a Latin Square design, where 

J* is defined in Section 8 . 6 (d) is such that each in this sample is of 
form (10.8.7) and its variance ct* consists of the seven components given in 
(10.8.8). However, if the total sum of squares Sj for the sample is decom¬ 
posed into the sums of squares 5.oo, S*.q, S*^. and Sg as defined in 
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(8.6.40), it is only possible, as may be seen in (8.6.41), to estimate the 
following components in (10.8.8): 

(10.8.10) <T?oo, <^0’0» 

where <4 = + «4)- + 

In other words, we can estimate only the row, column, and layer com¬ 
ponents individually, and the sum of the remaining four components. 
These estimates are obtained, of course, by applying 10.7.1 to (8.6.41), 
which gives: 

r — 1 

( 10 . 8 . 11 ) 

_ ^-00 + SqO ~h ^00- __ 

(2r - 1) (2r - l)(r - 1) ' 

A Model II analysis of variance table similar to Table 10.2 for the 
Latin Square design is straightforward and we leave it to be set up as 
an exercise. 


10.9 LINEAR ESTIMATORS FOR MEANS OF 
STRATIFIED POPULATIONS 

(a) Definitions and Notation 

In many practical problems of sampling from a finite population 
the population is decomposed into m disjoint strata ..., 
where iVx + ‘ ‘ ^ where the mean and variance of are 

fig and d^, g = I,..., m. If the mean and variance of are /x and a*, 
then it can be verified that n and o4 can be expressed in terms of Ng, fig, 
d^, g = 1,... ,m, as follows: 

(10.9.1) = 

0 

and 

(10.9.2) «r* = (7^ + 4 
where 

(10.9.3) 4. - z (^)<^. ^ zi-*, - f-r 
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and 

(10.9.4) . 

Note that a® is the sum of two components, namely, ajy, the within-strata 
component, and cr|, the between-strata component. 

If ..., are independent samples of sizes Wi,..., from the 
strata ...,respectively, let be the sample means, 

and 5f,..., the sample variances of ..., respectively. The 
strata samples ..., taken collectively, where Wi + • • • + Wm = w, 
are called a stratified sample of size n from 

The main problem here is to find optimum linear estimators for p based 
on means of samples from the strata for various amounts of information 
given concerning pg^ and l,...,7w. Actually, we shall 

consider the problem under two sets of conditions, namely: 

(i) where the strata sizes ATj,..., are known; 

(ii) where the strata variances ..., are known in addition to 
the strata sizes. 

If the population is unstratified, we have already seen in Section 10.2 
that the minimum variance linear estimator of p from a simple random 
sample from is x and the variance of ^ is (1/n — 1/^)0^. This result 
will provide a “standard” against which the variances of estimators devised 
for conditions (i) and (ii) can be compared. 

Stratified sampling is widely used in sample surveys in government, 
business, and industry and there is a great deal of literature on the subject. 
Here we shall only give an introduction to the mathematics of this kind of 
sampling. The reader interested in additional reading is referred to books 
by Cochran (1953), Hansen, Hurwitz, and Madow (1953), Stephan and 
McCarthy (1958), Sukhatme (1954), and Yates (1949). 

(b) Linear Estimator for the Population Mean from a General Stratified 
Sample 

In this case it is assumed that the strata sizes Ni,,., are known, 
and the strata sample sizes /ij,..., each >2, are specified. Let 
(^gv • • • > be the elements in sample g = 1,..., m. This set of 
strata samples is called a general stratified sample from tt^. Let L be i 
linear function of all sample elements, that is, 

m ng 


(10.9.5) 
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We wish to determine the so that L is an unbiased estimator for n 
having minimum variance. For L to be unbiased we require that the 
be chosen so as to satisfy 

m rig 

“22 - 2 p„N 

, . a=l|=l 0 

that IS, 

(10.9.6) 2 g = 1,..., m 

and also so as to minimize a\L). Since the strata samples are independent 
we have 

(10.9.7) Ai.)=|[^(|;4,)-i(S4]. 

It can be verified that the values of the Cg^ satisfying (10.9.6) and 
minimizing (10.9.7) are 

(10.9.8) = f = l,...,n,,g= l,....m. 

When these values are substituted in (10.9.5), let the resulting general 
stratified sample estimator for p be denoted by S~\p). Then 

(10.9.9) C V) = Pi*i + • • • + 

The variance of is given by 

(10.9.10) V)) = 2 p?(- - ^) 

a \ng JV/ 

from which it is seen that 

(10.9.11) 

(7 Ngj 

is an unbiased estimator for 
Summarizing, 

10.9.1 Let TTy be a finite population with disjoint strata ..., of 
known sizes. Let the mean and variance of be Pg and c^, 
^ = 1,..., m. Let ..., be a general stratified sample 
from TTfij. Let the sample mean and variance of be Xg and 
s^yg 1,..., m, respectively. Then ^g\p) given in (10.9.9) 
for the mean of Try is a minimum-variance linear unbiased estimator 
for p. The variance of ^i'Kp) is given by (10.9.10), and an un¬ 
biased estimator for this variance is given by (10.9.11). 

In the preceding discussion it should be noted that the strata sample 
sizes /ij,..., each > 2, are fixed. It will be further noted that 
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is an unbiased estimator for the mean of nf, for any choice of the strata 
sample sizes. 

In the particular case where the strata sample sizes ..., are chosen 
to be proportional to the strata sizes, that i$, if n, = pgn, g = I,... ,m 
we have a proportional stratified sample of size n from wjf. (In a practical 
situation, of course, ppi would be rounded to the nearest integer.) For 
this choice oin,,g — 1,..., m, we shall denote by The 

variance of is 

(I0.9.i0fl) = (i - ^) 2 pX. 

\n N/ B 

whereas 

(10.9.11a) 

\n N/ g 

is an unbiased estimator for 

The efficiency of relative to x, the mean of a simple random 

sample of size n from in estimating the mean fi of is defined by the 
ratio 

(10.9.12) 
where 

(10.9.13) + 

It is evident from (10.9.12) that the advantage of proportional stratified 
sampling over random sampling for linearly estimating the mean of ir^f 
depends on the ratio If ss 0, that is, if all strata have equal 

means, then proportional stratified sampling offers no advantage over 
simple random sampling. 

Hence 

10.9.2 If we choose a proportional stratified sample with n^ = ppt, 

g ss 1. m, the resulting linear estimator has variance 

(10.9.10a), of which (10.9.11a) is an unbiased estimator. Further¬ 
more, the efficiency of relative to x, the mean of a random 

sample of size n from nf,, as an estimator for the, mean of pif, 
is given by (10.9.12). 

(c) Minhnnm Variance Linear Estimator for tiie Popnlatim Mean if Sizes 
and Variances of Population Strata are Known 

If the values of of,..., (t® as well as Ni . are known, we can 

make a "better” choice of strata sample sizes than those provided by 
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proportional stratified sampling. The problem here amounts to mini¬ 
mizing as given by (10.9.10), subject to the condition that 

«! + ••• + »« = «. Allowing «i,..., n„ for the moment to vary 

continuously, the reader can verify that the values of .satisfying 

the required conditions are as follows: 

(10.9.14) n, = b,n, g = 1. m 

where 

(10.9.15) h, = p,a,. 

a 

For this optimum choice of «i,..., let be denoted by 

Then we have, after some simplification of (10.9.10), 

(10.9.10b) = - (l P, i 2 pA- 

n\g / No 

(In a practical situation, one would, of course, choose for each rig the 
positive integer nearest nbg,) 

It can be verified that 

(10.9.16) 

the equality holding if and only if Oi = • • • = For by comparing 
(10.9.10) and (10.9.10fl), (10.9.16) holds if and only if 

that is, if and only if 

(10.9.17) (2 < (2 P>) (2 pX) 

since ^Pg = 1, the pg all being positive. 

9 

But (1*0.9.17) is a special case of the well-rknown Schwarz inequality, and 
equality holds if and only if (Ti = • • • = 

A stratified sample from of size n for which rig satisfies (10.9.14) is 
sometimes called an optimum stratified sample of size n from tt^. 
Summarizing, we have the following result due to Neyman (1934): 

10.9.3 Let TT^ be a finite population with disjoint strata ..., 

whose sizes ^ and variances erf,. .., are known. 

Then the stratified sample of size n from which provides the 
minimum variance linear estimator {p) for the mean of is 
that in which the strata and sample sizes are ng, as given by (10.9.14). 
The variance of is given by (10.9.10i). ^^(p) and ^fp(p) 

are equally efficient for estimating the mean of if and only if 
the strata variances are all equal. 
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(d) Minimum Variance Linear Estimator for the Population Mean for 
Fixed Total Cost 


In practical situations there are often considerable variations in the cost 
of sampling from the various strata. Under such conditions it is sometimes 
desirable to choose sample sizes from the several strata so as to minimize 
in (10.9.10) subject to a fixed value of over-all cost. More 
precisely, suppose c, is the cost of determining the value of each element 
in the sample of size «, from If C is the total cost, then 

(10.9.18) C = Cj/ii -H • • • -1- 

It is readily found that the minimum of subject to (10.9.18) 

occurs for 


(10.9.19) 


n, = CB„, g=l,...,m. 


where 

(10.9.20) 


I _ \/C|, Pg<T„ 


It is to be emphasized that this procedure assumes strata sizes . N„ 

and strata variances, of,..., as well as the costs q,..., c„ are 
available (known). 

For the choice of the n, indicated in (10.9.19) let be denoted by 

Then we have 

(10.9.21) “ ^(i ^ 2 pA- 


(e) Extension of Results to Stratified Sampling from Infinite Populations 

The results summarized in 10.9.1,10.9.2, and 10.9.3 can be extended in 
a straightforward manner to the case of stratified sampling from an 
infinite population. Extensions to this case are left as exercises for the 
reader. 


10.10 LINEAR ESTIMATOR FOR MEAN OF STRATIFIED 
POPULATIONS IN TWO-STAGE SAMPLING 

(•) Two-Stage Samj^ing 

In situations where irif has a large number of strata, it may not be 
economically feasible to draw a sample from each stratum. In this case, a 
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two-stage sampling procedure may be used, in which, first, a number of 
strata are designated by a random process with specified probabilities, 
and then, elements are drawn by simple random sampling from each of 
these strata. The object of the sampling is to obtain a linear estimator 
for the mean of //, the variance of this estimator, and an estimator for this 
variance. 

For simplicity we shall consider the case where the size of each stratum 
is a multiple of an integer v, that is, 

(10.10.1) l,...,m. 

We shall refer to v as the sampling unit. Thus, the gth stratum may be 
regarded as consisting of Mg sampling units. The total number of sampling 
units in is therefore + • • • + = A/, say. Therefore 

(10.10.2) Mv, 

We now consider a random process proposed by Wilks (1960b) for deter¬ 
mining the number of sampling units to be drawn without replacement 
from each stratum of the population, subject to the condition that the total 
number of sampling units to be drawn is u. We assume that the process 
is such that each of the M sampling units may be regarded as having the 
same probability of being drawn. (This can be realized in practice by the 
use of random numbers.) Let dg be a random variable denoting the number 
of sampling units to be drawn from ^ = 1,..., /n. (^j,..., d„) is 
therefore an (m — l)-dimensional random variable having the (m — 1)- 
dimensional hypergeometric distribution with p.f. 


(10.10.3) 

where ^ 1 + • • • + = u. 

The result of the first stage of sampling, therefore, essentially tells us 
that a sample of size dgV is to be drawn from n^^g, g — 1,..., w. 

At the second stage of sampling, we draw samples of sizes d^v, ..., d^v 
from ..., TTjv^, respectively. This process provides a two-stage 
sample of size uv from which can be regarded as a stratified sample 
where strata to be sampled have been chosen by a random procedure. 
In practice where the number of strata is large many of the would be 0, 
that is, many strata would not be sampled at all. The total number of 
strata sampled would be the number of nonzero d’s. 
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(b) linear Estimator for Population Mean 

Let be the following linear estimator for n 


that is. 

+ • • • + (d«»)«« 

uv 

(10.10.4) 

^ ^^ + • • • + <5«*m 
u 

We have 

(10.10.5) 

wv))= 

g \ U / 


Making use of iterated mean values as stated by 3.7.2, we have for 
g= l,...,fM 

(10.10.6) i I 8,)l 

where | <5,) is the mean value of the conditional random variable 

Xg I d,, while ] denotes unconditional mean value with respect to the 

random variable d,. But 

1 ~ 

and it follows from (6.1.8) that 

=p,u. 

Therefore, 

(10.10.7) ^(^*- V)) = 2 = A* 

9 

which shows that is an unbiased estimator for p, the mean of iTff. 

The variance of ^^*(/“) is defined by 

(10.10.8) V)) = <^'[2 (^' - 

_ #[2 i (,, _ +1 (i _ 

Squaring this expression, and taking iterated mean values, we find after 
some simplification that 

By referring to (10.9.3) it will be seen that orS(^^^(ju)) is a linear function 
of the three quantities 

(10.10.10) loj, -!*)*• 
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If we let Gi, G 2 , and G^ be functions of the two-stage sample elements 
defined as follows, v > 2: 


( 10 . 10 . 11 ) 

Gi = I = 2 ^ sS; G 3 = 2 - <«’*■ W 

0 Ng g 

it can be verified in a straightforward manner that A^i)» and 

^(Ga) are linear functions of the three quantities in ( 10 . 10 . 10 ), such that 
there exists a linear function of Gj, G 2 , G 3 which is an unbiased estimator 
for Actually, this unbiased estimator is obtained byreplacing 

a%r 9 cf% and in (10.10.9) by the following unbiased estimators 
respectively: ^ 

( 10 . 10 . 12 ) 

^ = "TTJ 7: (^1 ■" ^2) 

u(N — 1) 




Nv 


_ ’ N - 2 

uv(N - 1) + nIuv(N - 1) 


(Gi - Ga) - 


Nu 4* 2v 


Go + 


G3 



We may summarize the preceding results as follows: 

10.10.1 Let TT^ be a finite population consisting of the disjoint strata 
• • •»where Ng = MgV, g = 1,.. ., m and v is a given 
integer {sampling unit). Let a two-stage sample of u sampling 
units be drawn from n^. Then Is an, mbiased linear 

estimator for the mean of The variance of is given by 

(10.10.9), whereas the right-hand side of (10.10.9) with cr^, (t^, 
2 replaced by their estimators given in (10.10.12) is an unbiased 

Q 

estimator for o\S^ \fi))jfor v > 2. 

(c) Case of Infinite Population 

For the case of sampling from an infinite stratified population having 
probabilities Pi,... nieans ..., and variances erf,..., of 
the strata ttj, ..., respectively, the situation is as follows. 

The random variable (di,..., d,„) has the multinomial p.f. 

(10.10.13) p(S, .dj = • • • p*- 

dj! • • • d„l 


where +-h d* - «• 
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The variance of in this case has the following form: 


(10.10.14) ^ ^ 

uv u 

where 

= 2 Pg^a^ = 2 PgiP'g *“ 

g g 

while 


(10.10.15) 


'L+JL±i c, + _J!!_ c,-^2^4 

iin(ii + 1) ii(n + 1) iin(n + 1)» p. 


is an unbiased estimator for <t*(^2 ^(/u)) where n =? mu. Note that a%- = 
o% + 0(1/JV) and <4, = <4 + 0(1/N). 


(d) Minimum Variance Linear Estimator for the Population Mean for 
Fixed Total Cost 

Suppose we hold the sample size uv fixed, say let uv = n, and write 
as follows: 

(10.10.16) 

n \ N — V } 

where 

(10.10.17) A = oV, B = a%. 

N g 

It is evident from (10.10.16) that since v must be a positive integer, the 
value of V which minimizes is = 1, provided, of course, that 

A + NB > 0 and there are n strata. Here our two-stage sample from 
reduces to a simple random sample of size n from In particular, it 
should be noted that in this instance the variance of given by 

(10.10.9) reduces to (l/n — 1/A^)cr^ where cr^ is given by (10.9.2). 

In view of this situation it is clear that the choice of values of u and v 
in a given situation cannot be based on the criterion of minimizing 
(T*(^^^(/i)), since this criterion would require that i? = 1. In a practical 
situation the criterion for choice of u and v is to select a pair of values of 
u and V so as to minimize or^(^^^(iu)) for a fixed total cost. In this case 
the simplest assumption is that the cost per sampling unit is Q, whereas 
the cost per sample element is Cg. The total cost C, therefore, will be 

(10.10.18) C = CiU + Cawu. 

If as given by (10.10.9), is minimized subject to the restriction 
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(10.10.18), it is found by allowing u and v to vary continuously that the 
minimizing value of v, say d, is given by 

G=N^, H=N^-l 
C C 

and where A and B are given in (10.10.17). The corresponding value of 
say w, would be found, of course, by substituting v for v in (10.10.18) and 
solving for u. If we let <^2e \fi) denote the estimator ^2 ^(j^) under these 
conditions then, of course, the value of is given by (10.10.9) 

with u and v replaced by H and v. Similarly, if these values of Q and 0 
are inserted in (10.10.12) for u and v, we obtain an unbiased estimator for 
u®(<^&H/«))- In the case of large N, we have 

( 10 . 10 . 21 ) v = + 

a - - 

Ci + 


Ob’ 


‘•{hi 


(10.10.19) 

where 

( 10 . 10 . 20 ) 


PROBLEMS 

10.1 Prove 10.2.1 for the case where x is the mean of a sample of size n from 
a finite population of size N, 

10.2 Suppose (a?!,...»a;J is an ii-dimensional random variable having a 
distribution such that the mean of each x is the variance of each x is a*, and 
the covariance of each pair of a;*s is Show that the minimum variance linear 
estimator for a* is the mean x of the n random variables ..., 

10.3 If Si and x' arc means of independent samples of sizes n and w' from 
distributions having means a* and a*', and variances and respectively, show 
that 5 is the minimum variance linear estimator for a* — 

10.4 If Ti,..., r* are unbiased estimators of a parameter e having (non¬ 
singular) covariance matrix ||or^^|| with inverse ||(y‘^l|, and if /i, ...,/* arc con¬ 
stants whose sum is 1, show that the minimum variance estimator of 0 of form 

^liTi, occurs for Ui ~ I .*, and that the variance of this 

estimator is 1 
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10.5 Show that the conclusions of 10.2.2 are still true if it is assumed that 

(^1 .is an it-dimensional random variable whose c.d.f. , 07 , 1 ) is a 

symmetik function of ..., 

10.6 If 5f are the size, mean, and variance of a sample from the 
distribution iVCu^, cr*) and 112 , « 2 » *^1 similar quantities for an independent 
sample from ^ 0 * 2 » show that 

(«1 — ^2) ± ^ni + n,-2,y •J'^l/^i + l//f2 

are l00y% confidence limits for (jl^ — where 


^ - » ■ , - - ‘H + <»• - 'Xi „a 

^ /ll+/f2--2 ni + n,-2,y 

is defined by (10.2.10). 

10.7 Prove 10.3.2. 

10.8 If in 10.3.1 (a?n,..., x^J, ..., , Xj^n) are chosen as unit vectors 

(constants) mutually orthogonal to each other, show that the minimum variance 
linear estimators i = 1, ..., A:, for j^i,..., respectively, all have 

variances equal to and covariances equal to zero. 

10.9 Referring to 10.3.4, show that the interior of the ellipsoid 

i <(0, - w, - >,) - 

<,j-l V't — K) 

is a 100y% confidence region for (/Jj. ki < k, where 




and where 


x>ii 


a*i*i 


1-1 


fcj.n-lf.y 


i 






being the p.e. of the Snedecor distribution S{kiy n — k). 


10.10 If a sphere in has center (Uj,'..., a^) and radius r show that the 
equations of the two hyperplanes spaces) parallel to the hyperplane 
having equation 


and tangent to the sphere are 
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10.11 Referring to Section 10.3, show that the probability is y that the 
inequalities 

2 ~ < X < X ^ fX^iM 

I 'V ij t t V 7j 

for all possible real vectors (q,..., Cj^), where q,..., c;b are not all zero, are 
simultaneously satisfied where 


Pi 9 Si and ^k,n-k,Y being defined in 10.3.4 and (10.4.4). 

10.12 Suppose (wi, ..., Wjfc) has the distribution lk,ill), where 

(/*!,.,,, f^k) are unknown and |lcr^, l| are known. Let xl.y point of 

the chi-square distribution with A:-degrees of freedom. Show that the probability 
is y that the inequalities 


2 CiUi - Xk.yjX < X < X +Xk.Yj'^ 

for all possible real vectors (cj,..., c*), where ..., c* are not all zero, are 
simultaneously satisfied. 

10.13 Verify formulas (10.6.5) and (10.6.6). 

10.14 In Problem 8.20 show how to obtain a lOOy % confidence interval for 

10.15 Verify (10.6.15) and (10.6.16). 

10.16 Formulate statements corresponding to 10.6.1, 10.6.2, and set up a 
Model I analysis of variance table for the complete three-way experimental 
design. 

10.17 Formulate statements corresponding to 10.6.1 and 10.6.2, and set up 
a Model 1 analysis of variance table for the Latin Square experimental design. 

10.18 Set up a Model II analysis of variance table for the incomplete balanced 
two-factor experimental design described in Section 10.8(Z>). 

10.19 Verify that the variance component estimators (10.8.6) for the incom¬ 
plete balanced two-factor experimental design reduce to those in (10.8.4) for the 
complete two-factor design if j' = 5 and r' =* r. 

10.20 Write down the estimators corresponding to (10.8.4) and the table 
corresponding to Table 10.2 for the case of finite populations of Ni /{-factor 
levels and of C-factor levels. 

10.21 Set up the Model II analysis of variance table for the complete three- 
factor experimental design. 

10.22 Set up the Model II analysis of variance table for the Latin-Square 
experimental design. 

10.23 In the complete two-factor Model I experimental design if » 0, 
Tj » 1,..., 5, the experimental design may be described as .^-replicates of a 
complete one-factor design. In this case show that S*llr(s — 1)] is an unbiased 
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estimator for (7*, where S* « S,. + 5o.(0). Assuming normality of distribution 
as in 10.6.2, show that 1007 ^% confidence intervals for fi and in this case are 

m ± 

mf. ± - 0] 

and that 

subject to 2 (ji^. — m^.) =» 0, is an (r — l)-dimensional spherical region which 
gives a 100>'% confidence region for (mi., ..., 

10.24 In the complete three-factor ^odel I experimental design, if 

/i^.{ are all zero, we have t replicates of a complete two-factor experiment of r 
rows and s columns. In this case show that S*.l[rsit — 1)] is an unbiased 
estimator for a\ where 5*. = 5... + Sq^CO) -f S’.q.CO) -h Sqo.CO). Also show 
that 

^ ‘S'.*, p ^ *^.000“^.). ^ ‘S’o.oO^.,,.) 

are independently distributed according to chi-square distributions with rs(t — 1), 
(r — l)(j — 1), (r — 1), (s — 1) degrees of freedom, respectively. Set up a 
Model I analysis of variance table to test the hypotheses: (i) that all /u^. are zero; 
(ii) that all are zero; (iii) that all are zero. 

10.25 Referring to Section 10.6(^), show that the probability exceeds y that 
the confidence intervals 

(m^. - m^,.) db <5v'2(r - 2)1 rs 

all contain (pi^. — S ^ * 1,...»r respectively, where 

and .^(r_i)(r.i)(f.i)y is the 100}/% point on the Snedecor distribution 
S((r - 1), (r - iXs - D). 

10.26 Suppose ..., are the means and 5f,..., the sample variances 

of independent samples of sizes itx,..., from N(mi^ a*),.,., <r*) 

respectively. Show that the probab ility exceed s y that all of the lk(k — 1) 
confidence intervals — £{) ± dVljn^ + ijn^ contain Cu^ — ^u^), i >y * 1, 
..., A: respectively, where 

^ - 1>I +••+(»» - 1>S 

and ^ 100}/% point of the Snedecor distribution 5(A:, n ~ k), 

10J7 Referring to the notation and definitions of Sections 10.5(c) and 
10.6(a) show that the probability exceeds y that the inequalities 


— Z) < — /*^.,) < (£^, — ^|.,) -I- D 
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D 




- 2)5’.. 


1 )(^~ 1 )* 


10.28 Referring to the notation of Sections 10.5(c) and 14.6(ft) show that for a 
fixed ri the probability exceeds y that the inequalities 


(">{,. - - D < (Wf,. - + D 

hold simultaneously for all choices of ^ ^ I' = 1 .r — 1 where 


D 




(r - 2)5... 
rst(r - l)(r ~ 1 ) • 


10.29 Referring to Section 10.9(a) suppose neither the strata sizes nor strata 
means nor strata variances are known and that the following double-sampling 
scheme is used; The first sample is a simple random sample of size r from 
which yields '‘i* • • •»elements from ,.. ., , respectively, where 

Tj + • • • + = r. Then a stratified sample of size n is drawn from so that 

/fi = (ri/r)/i,..., = (r^/r)/i elements are drawn at random, respectively, from 

X.Le‘ 


x' 



Xg 


where is the mean of the sample of size from . Show that x' is an 
unbiased estimator for the mean fi of Also show that the variance of x' is 
given by 


- 1 ) +«1 2 

“ L ^ 


H-» — 

rN ■“ 


n — r 


rnN(N - I),?!*"' 


10.30 {Continuation) 
I IN, then 


Suppose N is large enough to neglect terms of order 


c^x')^ 



n 



where and are defined in (10.10.14). 

Suppose the cost of drawing and observing an element in the first sample is 
and that of drawing and observing an element in the second sample is Cg. For a 
fixed total cost c = c^r + determine the choices of r and n which minimize 
a\£'), 

10.31 If (xi, ...,«„) is a sample from a distribution having mean // and 
variance a*, and if u is any unbiased linear estimator for show that the cor¬ 
relation coefficient between x and u is a{x)la{u), 

10.32 Suppose Li and are two unbiased linear estimators of a population 
mean constructed from a sample. Let the variances of and be and <t| and 
let the correlation coefficient between Lj and L 2 be p. Determine constants 
Cl > 0, Cg > 0, and q + Cg = 1, so that CjLi + CgLg has minimum variance. 
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10*33 If If, are the size, mean, and variance of a sample from a normal dis¬ 

tribution show that the probability is y that the next inde[^ndent dra wing from 
the same normal distribution gives an x in the interval x ± ^n-i.y + l)/ii 
where ^n-i.y is defined by (10.2.10). 

10.34 Sheppard's corrections for moments. Suppose (a^i,..., a; J is a sample 
from a p.d.f./(rr). Let /r be a random variable independent of (x ^,..., ^n) 
having the rectangular distribution /?(0, 6). Let /„ be the interval (A + — \d^ 

/r + <5a + i<5], a «..., —2, —1, 0, -1-1, -f-I,... and let n^ be thenumber of 
sample elements which fall in 4 . The quantity 

f^T.i - ^ 2 

" a 


is seen to be the rth moment of the sample if the sample elements are rounded to 
the nearest unit of size <5 using an arbitrary (and “randomly placed”) origin. 
Show that the characteristic function ^<0 of ^ is given by 

y(f) a, 1J ^ j dh 

where p^ * I /(a?) dx. Using this characteristic function show that 

= /^2 + ^/12; ^(Mi,) = + (5Vi/4 

and in general 

am;.,) = [A* + - Ax - 


where /i 2 > • • * moments of x and 

r + 00 

^{x ± ^<5)’’+^ * I (a; ± i<5)»‘+i/(a:) dx, 

J— 00 

Thus, — <5*/12, Afg ^ — hf{^d^l4, • • • are unbiased estimators for 

Ml, Mi, Mi ,.... [Sheppard (1898)]. 

10.35 Suppose f{x^, ...»a;^) is a function such that 0 < /(x^,..., x;t) < «*+i 
for every point (x^,..., x^) in the ^-dimensional interval 4 * {(x^,..., xj^): 
0 < X,. < I « 1,.. ., A} and such that the following integral exists 

J '«i /•«* 

••• f(P^i,---,^k)dxi--dxj,, 

0 Jo 


Let xi^,..., x^^i^ be independent random variables from the rectangular 
distributions R(iai, ni),..., Oh+i), { » 1>... > n. Let be a random 

variable with value 1 if < /(x^^,..., x;te) ai'd 0 if X|.+i^ > /(x^^,..., xj^^), 

( 1,..., It, and let 




1 

If 



Show that (ax * * * aj^i)^ is an unbiased estimator for J having variance not 
exceeding 

(Ol -flfcfl)* 

4n 
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Nonparametric Statistical Estimation 


11.1 INTRODUCTORY REMARKS 

In Chapter 10 we considered problems of devising linear and quadratic 
estimators for such population parameters as means, variances, covariances, 
and regression coefficients of finite and infinite populations. The process 
of linear estimation on which such estimators are based is, as we have 
seen, a relatively simple procedure which requires only mild assumptions 
about the random variables that enter into the linear estimators; namely, 
finiteness of means and of covariance matrices of these random variables. 

In this chapter we shall consider another class of rather simple esti¬ 
mators for quantiles and functions of quantiles of the c.d.f. of the popu¬ 
lation under consideration. These estimators, as we shall see, do not 
depend on the functional form of the c.d.f. of the population from which 
the sample is drawn. Estimators of this type are sometimes called non¬ 
parametric estimators to contrast them with parametric estimators discussed 
in Chapter 12. Parametric estimators, as we shall see in Chapter 12, are 
encountered in dealing with the problem of estimating values of unknown 
parameters in a c.d.f. having a specified functional form and depending on 
these parameters. 

The basic random variables involved in nonparametric estimation are 
the order statistics determined by a sample, and the coverages determined 
by these order statistics. The sampling theory of order statistics and 
coverages has been discussed in Section 8.7, and some of these results are 
used in the succeeding sections. 

11.2 CONFIDENCE INTERVALS FOR QUANTILES 
(a) Case of Small Samples 

Suppose we have an infinite population with a continuous c.d.f. F{x). 
We shall consider first the problem of determining confidence intervals for 
a given quantile from the order statistics of a sample from F{x). 

329 
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More specifically, let (x^,... ,x„)be& sample from /’(ar) and let (%,. 

X(„f) be the order statistics of the sample from F(z) as defined in Section 
8.7. Let and be any two order statistics of the sample. We 

know from 8.7.3 that the random variables « = ^(*(*,))> ^ = -^(*(* 1 +*,))> 
which are sums of coverages, have the ordered bivariate Dirichlet distri¬ 
bution k^‘, n — ki — + 1) and hence the p.e. of (u, v) is 


(11.2.1) /(«, v) du dv = 


r(n + 1 ) 


r(k^)r(k^)r(n -k^-k^ + i) 

• «**“ V — du dv. 


for 0 < M < i» < 1, and 0 elsewhere. Now consider the inequality 

(11.2.2) f(*(».)) <P< +»,))• 

Since F(x) is continuous, (11.2.2) is satisfied if and only if 

(11.2.3) < 

Hence the probability that (11.2.3) holds is equal to the probability that 
(11.2.2) holds, and therefore we can write 

(11.2.4) < %,+*,)) = ) I /(". f) du dv. 

Jp Jo 
But 

n p fp ri rp rv 

f(UyV)dudv = j f(u, v) dv duj f{u,v)dudv. 

J Jq Ju wOvO 

By applying the transformations 


u — r 

i? = 1 — .s(l — r) 


and 


u ^ rs 
V s. 


respectively, to the first and second integrals on the right of (11.2.5), we 
find 

(11.2.6) < X, < = 4(^1, n-ki + \) 

— Ip{ky^ + k29n — ki — k2 + 1 ) 

where /p(vi, V 2 ) is the incomplete beta function 

(11.2.7) fV‘-Hl - x)**-i dx. 

r(vi)r(v 2 ) •'0 

It should be particularly observed that the probability that the interval 
(*(*!>» *(* 1 +*,)) contains the quantile Xp does not depend on F(x\ and is 
therefore a confidence interval for Xp having confidence coefficient given 
by the right-hand side of (11.2.6). 
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We may summarize as follows: 

11.2.1 ... is a sample from a continuous c.d.f. F{x) and if 

and are the kjth and (k^ + k 2 )ih order statistics of the 

sample, then +!•,)) ^ ^ confidence interval for the quantile 

x^ having confidence coefficient IJik^, « - + 1 ) — 

w - Ati - *2 + 1). 

It should be noted that since and k^ are (positive) integers it is possible 
to find confidence intervals for x^^ by using two order statistics and 
^(ifei+* 2 ) values of the confidence coefficient y which can be 

taken on by the right-hand side of (11.2.6). One can, of course, set up 
confidence intervals whose confidence coefficients are at least y for certain 
ranges of values of y. Furthermore, the two order statistics we have 
discussed are arbitrary; (11.2.6) holds for any two order statistics 
and Ordinarily, for a given y, we would choose order statistics 

whose ranks are as close together as possible, that is, choose k^ and ^2 
so that ^2 is as small as possible. For instance, in setting up confidence 
intervals for the median ^ 0 . 5 , we would select the largest value of k for 
which 

(11*2.8) ^ ^ 0.5 ^ ^(n-Ar+l)) > y* 

As a matter of fact Nair (1940) has tabulated the values of k for which 
(11.2.8) holds for n = 6, 7,. . ., 81 and for y = 0.95 and 0.99. The exact 
value of the probability jP(a:(;fc) < ^ 0.5 < ^(«-m))isl 2 /o. 6 ('* — k + \, fc), 
a result found independently by Thompson (1936) and Savur (1937). 

(b) Case of Large Samples 

Suppose {xj ^,... , a;„) is a sample from a continuous c.d.f. F{x), and let 
Xj, be the p\h quantile. If n^ is the number of components of the sample 
which are less than x^^, then n^ is a random variable having the binomial 
distribution Bi{n,p). We know from 9.2.1a that for large n, n^ is asymp¬ 
totically distributed according to N{np,np{\ —/?)). Hence for a given 
confidence coefficient y, we have 

( 1 U. 9 , 

where 

( 11 . 2 . 10 ) dy = y. 

yJlTT J-Vy 

Thus, for large n, an approximate lOOy % confidence interval (py, py) for p 
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is given by the set of all values of p satisfying the inequali^ in (11.2.9) for 
fixed Ki, n, and Py, that is,py and Py are the two solutions of 

(«i - np)* 
np(X - p) 


( 11 . 2 . 11 ) 


Vy 


tot p. 

Hence, we have for large n. 


(11.2.12) P{py<p<p,)^y 

which is equivalent to 

(11.2.13) ■P(*(inpy]) < *, < ®((»^i)) ^ y- 

Therefore, the two order statistics (*([«jy]), »((«^))) constitute an 
approximate I00y% confidence interval for the quantfie x,, where [npy'\ 
and [npy] are the largest integers in npy and npy respec^vely. 


11.3 CONFIDENCE INTERVALS FOR QUANTILE INTERVALS 


Suppose px < />2 two quantiles, and and are order 

statistics in a sample of size n. Then by argument similar to that by which 
we obtained (11.2.4) we find 

(11.3.1) P(x«,, < 2i., < *<»,+»,)) = f f /(«, v) du dv, 

Jo «/pg 

the left side of which may be rewritten as 


(11.3.2) P((a?j)j» ?j»,) ^ (*(»,)»*(*,+»,))) 


ni (-l)*pr** 

fcj! <-0 i! (n - fea - i)! 


W" - fci - *2 + Uki-i)' 


If for a given y there exist values of ki, k^, and n so that the probability 
expressed by (11.3.2) has the value y, then we would say that (x(,^), 
*(»,+»,)) is A 100y% outer confidence interval for the quantile interval 
( 2 * 1.2 

In a similar manner a 100y% inner confidence interval (*(tj), *(ifcj 4 .»,)) for 
(s*,. Sr) would be obtained by considering the relation 


(11.3.3) 


P(2,, < *(*,) < *(*1+*,) < Sr) 



f{u, v) du dv 


where k^, Jkg, and n have values for which the probability expressed by 

(11.3.3) is equal to y, andfiu, v) is given by (11.2.1). 

In the case of outer (inner) confidence intervals the “best” choice of kg 
and kg would be regarded as that for which kg is as small (large) as possible. 
For instance, ifpip and a* i — p, ^ < 1, it can be shown that the 
“best” choice of ki and kg in the sense just mentioned for both inner and 
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outer confidence intervals would be that they be of the form X(,) and 
respectively. 


11.4 CONFIDENCE INTERVALS FOR QUANTILES 
IN FINITE POPULATIONS 


Suppose TTff is a finite population whose elements have distinct :r-values 
< • • • < It was shown in Section 8.8 that the p.f. of the fcth order 
statistic in a sample of size n from is 


(11.4.1) 



the mass points of being t k,k + l,...,Ar — n + Jfc. Now for 
any fixed value of t, say t' 

(11.4.2) 

t-k 

If, for fixed N, «, t' and y > 0, there is a largest it, say fc' such that 

(11.4.3) lPN.nAt)>y^ 

we would regard x^^>^ as the *‘best” lower 100 }/% confidence limit for 
We may regard x^* as the {t'IN)ih quantile of the population tt^. Except 
for values of N, /i, t' and 1 — y which are uninterestingly small such lower 
confidence limits can be shown to exist. 

In a similar manner the “best” upper 100y% confidence limit for 
is obtained by choosing the smallest fc, say fc'', for which 

N-n+k'' 

(11.4.4) 2 Pf,.nA0 > Y- 

One can also write down a “best” 100y% confidence interval for x^., 
that is, simultaneous upper and lower confidence limits, but the p.f. 
involved here is more cumbersome than the p.f. PN.n.kiO* ^^1 

write it out. It can be shown, however, that if in a population tt^ having 
distinct a:-values < • • * < and if are the indicated 

order statistics of a sample of size n from 

(11.4.5) lim < X(»,+t,)) = /,(fei, n-ki + i) 

+ *2,« — fci — fcj + 1). 

Hence, for large N, (x(j^), is a confidence interval for x,(jvi,) with 

coefficient given approximately by the right-hand side of (11.4.5). 
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11.5 TOLERANCE LIMITS 
(a) Case of Small Samples 

Suppose {xi,... ,xj is a sample from a continuous c.d.f. F(x). Let 
•••.*«) < LgCa?!,..., be any two observable symmetric func¬ 
tions of (a?!,..., a:„) such that the distribution of F(L^ — F(Lj) does not 
depend on F(x), and such that for 0 < /? < 1, 

(11.5.1) P[(F(L^-^F(L^)> p]^y. 

Then and Lg will be called 100/8% distribution-free tolerance limits at 
probability level y. This concept is due to Shewhart (1931). 

As pointed out by Wilks (1941, 1942), order statistics can be used for 
distribution-free tolerance limits whenever F{x) is continuous. To see this, 
suppose (a;(i),..., a;^,,)) are the order statistics of a sample from an un¬ 
known continuous c.d.f. F{x), For any two order statistics ^(* 1 +*,) 
the amount of probability in the population distribution contained in the 
interval which we shall call C4^, is a random variable. It 

will be seen from Section %,l{b) that £4^ is the sum of coverages, and that 

(11.5.2) £/». = 

It follows from 8.7.6 that has the beta distribution Beik^, n — 
Jk, + 1), which, of course, does not depend on the population c.d.f. F{x). 
Now for given values of /3 > 0 and y > 0, suppose n, k^, and k^ exist so that 

(11.5.3) >fi) = y. 

Then x^^^y and a:,^ are 100/5% distribution-free tolerance limits at prob¬ 
ability level y. Note that is a special form of F(L 2 ) — F(L]) in (11.5.1). 
Robbins (19446) has shown that if P{F{L^ — F(L{) > jS) = y for all 
absolutely continuous F(x), then and L 2 necessarily order statistics. 

Suppose ki — c and = n — 2c -I- 1, thus making the interval 
(*(«»» ®(«-c+i)) symmetrically chosen from the order statistics. If we use 
incomplete beta function notation, (11.5.3) reduces to 

(11.5.4) 1 - I^n -2c+l, 2c) = y. 

For a fixed c there may exist no sample size n for which this equation 
holds exactly. However, since the probability computed from the beta 
distribution Be(n — 2c -|- 1,2c) on the interval (1 — e, 1), for any e > 0, 
can be made arbitrarily near 1 by taking n sufficiently large, there exists 
a smallest n for which 


(11.5.5) 


1 — /^(« — 2c -I- 1,2c) > y. 
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For instance, if we choose /S =» 0.99 and y — 0.95 and c = 1 ( tha t is, 
*1 = 1, *2 = « — 1) we find n = 473. Murphy (1948) has tabulated, and 
has presented in graphical form, values of ^ for which (11.5.3) holds 
for n = 1(1)10(10)100(100)500, y = 0.90, 0.95, 0.99, and for « - fcj + 
1 = m = 1(1)6(2)10(5)30(10)60(20)100. Somerville (1958) has extended 
Murphy’s results in tabular form. 

The notion of distribution-free tolerance limits for one-dimensional 
distributions having continuous c.d.f.’s extends to the case of multi¬ 
dimensional distributions having continuous c.d.f.’s. Thus, suppose we 
have a sample of size n from an infinite population having a continuous 
A:-dimensional c.d.f. F{xi, ..., x^^). Suppose the A:-dimensional sample 
space of (* 1 ,..., 2 !;^) is cut up into n + 1 mutually exclusive and exhaustive 
sample blocks ..., by ordering functions as described in 

Section 8.7(c). Consider any rule for choosing some r of these sample 
blocks and let the union of the blocks be T^. Let the sum of the coverages 
for the selected blocks be Uf, that is, 

(11.5.6) 17, = f dF(x,,...,x,). 

J Tr 

Then is a random variable, and it follows from 8.7.9 that has the 
beta distribution 5c(r, « — r -1- 1). Thus, if for given /? > 0 and y > 0, 

(11.5.7) P{U,>p) = y. 

Uf would be a 100/5% distribution-free tolerance region at probability 
level y. If, for any fixed positive integer c, we choose r = « — 2c -I- 1, 
then (11.5.7) reduces to (11.5.4), the same equation we had for one¬ 
dimensional tolerance limits. 

(b) Case of Finite Populations 

If t' is chosen so that t'jN = (1 — /?) then (11.4.2) can be written as 

(11.5.8) -/?))) > y- 

in which case we may regard as a 100/5% lower tolerance limit at 
probability level y, that is, the probability is at least y that the fraction of 
elements in having x-values exceeding is ft. 

Similarly, if t' = ftN, we can rewrite (11.4.4) as 

(11.5.9) P(x^^^ > > y, 

thus making x^^.■^ a 100/5% upper tolerance limit at probability level y. 

It is evident how a 100/5% tolerance interval at probability level y would 
be defined. 
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If, for a large population irjf having distinct a^>values, we let be 
the fraction of x-values in wjf lying in the interval it can be 

shown that the limiting d^tribution of U^jf, as N->- co, is Be(kf, 
n — jfc, + 1). Hence, for large N, the interval is an approximate 100/9% 
tolerance interval at probability level y. 


11.6 ONE-SIDED CONFIDENCE CONTOURS FOR A 
CONTINUOUS DISTRIBUTION FUNCTION 


One of the fundamental problems in applying the sampling theory of 
order statistics to statistical inference is to construct functions F^(x) and 
from the order statistics of a sample from a population ^ving a 
continuous c.d.f. F(x) so that for a given y 

P(F+ix) > F(x); for all x) = y 

PiF-(x) < /"(x); for all x) = y. 

Such functions F^(x) and F~(x) will be called, respectively, upper and 
lower 100y% confidence contours for F{x). 

An asymptotic solution for this problem for large n was originally 
obtained by Smirnov (1939a). More recently, Smirnov (1944), and also 
Bimbaum and Tingy (1951) have obtained a simple expression for the 
probabilities (11.6.1) that we shall present here. 

Let (X(i),..., X(„)) be the order statistics of a sample from an infinite 
population having c.d.f. F{x). The empirical c.d.f. F„(x) is constructed 
from the order statistics as follows: 


( 11 . 6 . 2 ) 


F„ix) = 


0 

f- 1 

n 


X < 

*«-!)< * < *(?)• 


I 1 * > *(«)• 

For 0 < d < 1 and for any value of x let 


f = 2,..., n 


(11.6.3) 
and let 

(11.6.4) 


F+(x, d) = min [FJ,x) + d; 1] 
Fn(x, d) = max [F„(*) - d; 0] 

Dtid)>=infiFt{x,d)-Fix)) 

9 

D;(d) » sup (F;(®, d) - Fix)). 


Note that P(D^id) > 0) is the probability that the graph of FJpe) -f d 
never meets the graph of F(x); and similarly P{D^id) < 0) is the proba¬ 
bility that the graph of F,(x) — d never meets that of Fix). 
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The main result can be stated as follows: 


11.6.1 Let ... , be the order statistics of a sample from a 
continuous c.d.f F{x), and let Ff(x, d) and F~{x, d) be constructed 
from the empirical c.d.f F„(x) as /n (11.6.3). Then P{D^ {d) > 0) = 
P{Dn(d) < 0) = P„(d) where 


(11.6.5) P-M = 


[«(!-<<)] 
-d 1 ^ 

t = l \l 



i-1 


and [a/(1 — d)] is the largest integer contained in n{\ — d). 

To establish 11.6.1 we first transform to new random variables .y^),..., 
defined by 

(11.6.6) f = 1,..., 

Then we know from (8.7.2) that ..., y(„) have p.e. 

(11.6.7) n!<ya)-“^/y(„) 


inside the region 0 < y^) < * * • < y(„) < 1 and 0 elsewhere. 

First consider P{D:^{d)> G). It is evident that the order statistics 
^(i)» • • •» ^(n) satisfy the inequality D^{d) > 0 if and only if the order 
statistics y^),..., y(„) satisfy the following set of inequalities: 


f — I 

^ 2/(^) ^ + d, f = 1, . . . , fc + 1 

< 2/(1) < ^ = /c -f 2 , . . . , n 


where k is the largest integer in //(I — d) and y(o, = 0. Similarly, {d) < 
0 will be satisfied if and only if 


(11.6.9) 


— d y(^) y( 
n 




0 < y^) < 2/(^+i), 


^ = n - k,, . ,, n 
f=l,...,n~fc~l 


where y^n+v = 1- But if we make the change of variables y^^) = 
1 — f = 1,. .., Ai, then the order statistics y [^),.. ., y[^) have 

p.e. (11.6.7) and the inequalities (11.6.9) reduce to (11.6.8). Therefore, 
P{D^{d) > 0) = P{D~{d) < 0). Therefore, the common value of these 
probabilities, say PJid), is obtained by integrating the p.e. (11.6.7) over 
the region within 0 < y^) < * • • < y^^^) < 1 determined by the inequalities 
(11.6.8). That is, 

( 11 . 6 . 10 ) 


PM^n\G,ik,d) 
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n l/w + d rkln+d ri ri 

• • • rfy(„) • • • dyay 


Integrating with respect to • • •»y(jfc+ 2 )> we find 

( 11 . 6 . 12 ) 


G„(k,d) = 






-ifc-i 


(n-k - 1)! 


■ dy 


(At + l) 


^^2/(1)- 


From here on the evaluation of Gn{k, d) proceeds by induction. Thus, by 
integrating with respect to 2 /(fc+ 2 ) one finds 


(n-k - 1)! 

n l/n+d fkln + d 

dy^k^i) • • • dy^.y 

'(ji •'»<*» 


(11.6.13) G„(fe + 1, d) = G„(/c, d) - 
where 


H„(k, d) 


Omitting the details of a mathematical induction, we hnd 

which can be written as 


(11.6.15) H„(k, d)=- 
which can be written 

(11.6.16) H„ik,d) = 




(k + iy.dt'‘ 

( 11 . 6 . 17 ) |— [e<‘'/(* + l) + l/n)< _ g<<</(* + l))(j*: + l| _ Q 


But we note that 

1- 

lat* 


Therefore, 
(11.6.18) 
H„(k, d) = 


a* 


(k + 1)! Idt^ 


,(<( + (fc + l)/n)( 


'1 - 

X=0 + 1) 


:h^)‘ 


Substituting this value for d) in (11.6.13) and noting that 


(11.6.19) 


Gn(0, d) = 


1 - (1 - dy 


we find by induction the value of G„(fc, d), which, when inserted in (11.6.10), 
and noting that k is the largest integer in «(1 — d), yields the expression 
in (11.6.5) for Pn(^)> thus completing the argument for 11.6.1. 

Bimbaum and Tingey have tabulated values of d for which PnW = y 
for y = 0.90, 0.95, 0.99, 0.999, and /i = 5, 8, 10, 20, 40, 50. For values 
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of n greater than SO the Smirnov asymptotic approximation (11.6.22) 
provides a good approximation for values of d. 

Now let us consider the limit of P„(rf) as n ^ oo. Putting d =» kfy/n, 
we may write (11.6.5) as follows: 


( 11 . 6 . 20 ) 



where Ay = l/n. 
factorials (7.6.29), 
integral 

( 11 . 6 . 21 ) 


Making use of Stirling’s approximation for large 
it can be verified that the sum in { } converges to the 


1_ p 1 

^y(l — y) 


dy 


as n -> 00 , which, when integrated, yields 


1 

- e 
A 


-2A* 


Hence, we have the following result: 

11.6.2 Under the conditions of 11.6.1, 

(11.6.22) limP„(4=) = 1 - 

n-^oo 

This result was originally established by Smirnov (1939a), but the more 
direct derivation from (11.6.5) is due to Dempster (1955). 


11.7 CONFIDENCE BANDS FOR A CONTINUOUS 
DISTRIBUTION FUNCTION 

The problem of establishing two-sided contours, or a confidence bandy 
for F{x)y that is, of determining the value of the probability 

(11.7.1) P(Dn<d) 

where = sup \F^ {x) — /’(a;)| 

X 

for arbitrary d, is more difficult than the problem of the one-sided contour 
discussed in the preceding section. Kolmogorov (\933b) gave an asymp¬ 
totic solution to this problem for large n and recurrence formulas from 
which values of (11.7.1) can be computed for finite n. Wald and Wolfowitz 
(1939) have given a solution for finite n in determinant form. More recently, 
Massey (1950) has given a fairly simple solution to the problem if n is 
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finite and if d is a multiple of \jn. His solution is in the form of recurrence 
formulas which we shall consider here. Massey’s result can be stated as 
follows: 


11.7.1 Let , x^„{) be the order statistics of a sample of size n from 

a continuous c.d.f. F(x). Let 

(11.7.2) D„ = sup |F„(x) - Fix)\ 

X 

where is defined in (11.6.2). Then 

(11.7.3) p(d„<-) =^U(fc,«), fc=l,...,n-l 

\ n/ n^ 

where U(j, m + 1), y = 1,..., 2A: — 1; w = 0, 1,. . ., w — 1, 
satisfy the system of equations 

(11.7.4) U(j, m + 1) 

*=i0 + 1-0! 

subject to the boundary conditions 

£/(/, aw) = 0, / > AW + A: 

(11.7.5) C/(/,0) = 0, 1 

U{K0) = 1. 

To establish this result suppose (x^,.., is a sample from the 
continuous c.d.f. F(x), and for convenience let us denote the random 
variable F(x^) by I = 1,..., w. Then has the rectangular distribu¬ 
tion on the interval (0, 1), that is, 1). Let G{y) be the c.d.f. of y and 
Gn(y) be the empirical c.d.f. constructed from the order statistics 
(2/(1), • •., y(n)\ Then, of course, 

(11.7.6) p(d„ < = p(sup \G„{y) - G{y)\ < -f 


We now cut up the interval (0, I] into n equal intervals = ((i — l)/n. 
S/ft), S = Let (/-j,..., r„) be a random variable (degenerate, 

with + • • • + = n) denoting the numbers of elements in the sample 

(^ 1 . • • •. 2/n) falling into /i,. . ., /„, respectively. The r’s, of course, have 

the (n — l)-dimensional multinomial distribution M(n; Ijn . l/n) 

whose p.f. is 


(11.7.7) 


Pifi ,..., r„) = 


nl 


r,l 


'•■■rjn” 


Now (rj ,... ,r„) uniquely determines G„(y), and hence the value of 
P(D„ < k/n) is determined by summing p(ri,..., r„) over all points in 

the sample space of (r^. r„) for which \G„(y) — G{y)\ < k/n for all y. 

As we follow the graph of ojy) from left to right, which lies completely 



Sec. 11.7 


NONPARAMETRIC STATBTICAL ESTIMATION 


341 


within the band E for which |<7„(y) — G(y)| < kjn for y = l/», 2/«, ..., 
(m + l)/«, the graph must pass through one of the points (m/n, 
(m — k + i)ln), i = I,... ,2k — 1. Let us call these points AQ, m), 
i = I,... ,2k — 1 respectively. Now let 

(11.7.8) Uii, m) = 2(0—-, 

where 2(i) denotes summation over all sets of values of (r^,. . ., r^) for 
which the graph of G^iy) arrives at A(i, m) while remaining inside the band 
E, Then, since GJiy) is nondecreasing, its path can reach A{j, m + \) 
only by having passed through one of the points A{\, m), A{2, w),..., 
A{j + 1, w) and having take on the values y, y — 1,..., 1, 0, respec¬ 
tively. Therefore, we must have 


(11.7.9) 


lJ(j, m + 1) = 




■ U(j + 1, m) 

(7-1)! 0! 


for y = 1, 2,. .., 2fc — 1 and m = 0, 1,..., n — 1, where, of course, 
U(i, m) 0 if i > m + k. Furthermore, it is evident that C/(/, m) satisfies 
the following boundary conditions: 


(11.7.10) 


£/(/,0) = 0, /= 1,...,A:- 1 

U(k, 0) = 1. 


If the system of difference equations (11.7.9) is solved for (7(fc, n) subject 
to conditions (11.7.8), we obtain P{Dn < kjn) by formula (11.7.3), thus 
completing the argument for 11.7.1. 

Various values of F(/7„ < kjn) for n = 5(5)80 and k = 1(1)9 have been 
tabulated by Massey, and from these values he has computed by inter¬ 
polation values of P{Dn < ^jVn) for n = 10(10)80 and A = 0.9(0.1)1.40. 

Kolmogorov’s (1933^) asymptotic result referred to earlier is that under 
the conditions of 11.7.1 

(11.7.11) lim i>(D„ < -^) = 2 (-l)V-*‘‘< 

n-»oo ' -y n/ i—~-oo 

Kolmogorov’s proof of this result is complicated. A simpler proof has 
been given by Feller (1948). Doob (1949) and Donsker (1952) have 
provided a proof by the use of Gaussian stochastic process theory. 
Darling (1957) has published an expository article covering this and 
related problems, which has an extensive list of references. 

It should be noted that (11.7.11) implies that for arbitrary e > 0, 

(11.7.12) lim P(D„<e)= 1, 

n-*oo 

that is, Fn(x) converges in probability to F(x) uniformly in x as n oo. 
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PROBLEMS 

11.1 If in the sample space of (w, v), whose p.e. is (11.2.1), £ is the set of 
points for which u < p and £is the set for which v > p, derive (11.2.6) from the 
basic probability law £(£ U £) = P{E) + P(F) - P(E n £). 

11.2 In (11.2.8) show that 


Pi^ik) < 2^0.5 < ^(n~Ar + l)) = 1 - 2/j(/I - k \,k). 


11.3 If ..., x^n)) are the order statistics of a sample from the continuous 

c.d.f. £(x), show that the probability is 1 — 4- (w — \)P^ that 

(That is, the fraction of the population contained in the sample range exceeds p 
with probability 1 — + (w — 

11.4 {Continuation) Show that for any positive values of ^5^ and for which 
^1 + ($2 < 1, the probability is 1 — (1 — ^^j)” — (1 — 6^)^ + (1 — 6^ — 6^^ that 
both of the following inequalities hold 


1 - F^^in)) < 

11.5 {Continuation) Show that for any fixed integer k «/2 the maximum 
value of P{X(^ic^r) < 2 ^ 0.5 < ^(n- ir hr hi)) Over all possible values of r (positive or 
negative integers or zero) occurs for r = 0, where £( 2 'o. 5 ) = 0*5. 

11.6 {Continuation) Show that 


P{Xf^j^^ < Xj,) = I^{k, n - k + \) 

where F{grj,) = p. 

11.7 {Continuation) Show that {x^f^y, ^(n-fc+i)) may be regarded as a confidence 
interval having confidence coefficient 

n -^k 

Vm = 2 Pm(.t) 
i^k 


for the median of a further independent sample of size Im + 1 from the same 
distribution where 


PmU) = 


{m + 1 ) 
(m + / + 1) 



llm + « + l\ 
\ /n + r + 1 / 


Show that the limit of as w-» oo is I — 2/Q.5(n — k + \,k). 

11.8 In (11.2.6) show that it p = 0.5, 

/j\« 

< ffo-s < *(*,+*,)) ” (2/ ^ \f/ ■ 
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11.9 If ..., are the order statistics of a sample of size n from a finite 

population of size 2m + 1 whose elements have different values of x^ show that 
(^(fc)» ^(n-ifc+i)) is a confidence interval for the population median having con¬ 
fidence coefficient identical with that given in the preceding problem. 

11.10 If a sample of size n is drawn from a continuous c.d.f. show that the 
probability is 

n{n - l) /w\ //a 7/ + w - l\ 

{m + w)\/// \m - / 4- 1/' 

that t x's in a further independent sample of size m will fall within the range of 
the first sample. 

11.11 {Continuation) Let be the fraction of the x's in the second 

sample which lie within the range of the first. Show that the limiting distri¬ 
bution of ^s A77-> fX) is the beta distribution Be{n — 1,2). More 

generally if and are the indicated order statistics in the first sample, 

and if i is the fraction of ./-’s in the second sample lying in the interval 

^(Ai + show that the limiting distribution of ts the beta 

distribution Be(k 2 , n — ‘k 2 -\- \). 

11.12 Suppose a sample of size n from a continuous two-dimensional 
c.d.f. is represented by n points in the ir;,?y-plane. Vertical lines are drawn, 
respectively, through the two points having the smallest and the largest aj-co- 
ordinates. Horizontal lines are drawn through the two points of the remaining 
« — 2 points which have the smallest and largest .v-coordinates. Consider the 
fraction U of the population contained in the rectangle bounded by these four 
lines. (Note that U is the coverage of the rectangle.) Show that the probability is 



that the coverage U exceeds p. 

11.13 Referring to Section 11.5(a) for the definition of show that for 

fixed k and a oo, -> 1, so that w(l — A, 

lim > /f) = 1 — ^ ^ ^ • 

11.14 Verify (11.4.5). 

11.15 Verify that /f„(A:, d) is given by (11.6.18). 

11.16 Show that the quantity in ( } in (11.6.20) converges to the integral 
(11.6.21) as 00 . 

11.17 Let X and be the mean and variance of a sample of size n from N{tiy a^) 
and let G{x) be the c.d.f. of V(/i, a^). Let (L^, be the interval ic ± Aj, A > 0. 
Show that C(L 2 ) — G{L^, as a coverage, has a distribution depending only on 
A and that for A = i y V(/i + 1 )// 7 , where /n -1 y is defined by (10.2.10), ^{G{L^ 
- C(Z,i)) = y, [See Wilks (1941) and Wald and Wolfowitz (1946).] 
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Many problems of statistical estimation deal with the problem of 
sampling from a c.d.f. of specified functional form F{x\ 0), where d is an 
unknown (real) parameter whose true value Oq is to be estimated from the 
elements of a sample ,,, ,xj assumed to have been drawn from 
F{x; 0), The true value 0^ is a point in a parameter space of values of 0, 
where il is an open interval or region (or all) of a Euclidean space. The 
parameter space Q is sometimes called the set of admissible values of 6. 
Needless to say, both x and 0 can be multidimensional. 

There are two important types of parametric estimators, namely, point 
estimators and interval estimators. In point estimation, the estimator for 
the true value of Sq of a parameter 0 in F{x', 0) is an observable random 
variable, say d(xi ,.. ., xj, which is a function of the sample elements 
(xi ,..., xj, and whose distribution is, in some seijc, concentrated 
around the true value Oq of 0. As in linear estimation, it will be found that 
the variance of the point estimator is often a reasonable criterion for 
measuring the concentration. 

In interval estimation we devise two observable random variables 
©(a?!,..., xj, 0 ( 0 ?!,..., xj, usually abbreviated (0, 0) where 0 < 0, such 
that there is a specified probability that the random interval (0, 0) contains 
00 and is, in some sense, as short as possible. These ideas can be extended, 
of course, to the case where x or 6 (or both) in F(x; 0) are multidimensional. 
Instead of talking about an “estimator 0 for the true value 0o of a par¬ 
ameter 0”, we will usually say “estimator 0 for 0.” 

The linear estimators which have been considered in Chapter 10 are 
examples of point estimators. We have given specific examples of interval 
estimators in Sections 10.2(c), 10.4, 10.5, 10.6, and 11.2(a) which 
arise in sampling from normal distributions. 
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In this chapter we shall discuss the ideas of point and interval estimation 
in the more general setting of parametric estimation for samples from 
infinite populations and then present some of the basic results for finite 
samples as well as some asymptotic results for large samples. 

12.1 DIFFERENTIATION OF PARAMETRIC DISTRIBUTION 

FUNCTIONS 

(a) Case of a One-dimensional Parameter 

In dealing with parametric statistical estimation, the testing of para¬ 
metric statistical hypotheses, and related problems, we shall need cer- 
tairt properties of derivatives of distribution functions which depend 
on a parameter 0. It will be useful to discuss these questions briefly 
here. 

Suppose a: is a random variable which has a c.d.f. F(x; 6) where 0 is a 
(real) one-dimensional parameter having values in a parameter space O. 
We shall consider a: as a one-dimensional random variable for convenience, 
although it will be seen that all results obtained will hold, with minor 
changes in notation, if x is A-dimensional. 

Thus, for various points Sj, fig* • • • have a corresponding set of 

c.d.f.’s F(x; 0i), F(x; 0^), .... We shall be especially interested in some 
open interval Qq ^ containing a particular point 6q, the true value of 6. 
Now suppose we differentiate both sides of 

(12.1.1) I dF(x;0)=l, 

one or more times with respect to 0. We shall consider what happens 
with one and two differentiations. If the two differentiations are formally 
performed, we obtain 

(12.1.2) ^ 'og 

and 

+J log dF(x; dF(x; 6) = 0. 

To examine (12.1.2) and (12.1.3) a little more closely and for later 



346 MATHEMATICAL STATISTICS 

reference, let us write, for the moment. 


(12.1.4) 

Six;d) = ^^logdFix;d) 

(12.1.5) 

S'ix; 0) = ^ Six; 0) 

(12.1.6) 

HiO, e')=J 

f log dF{x\ 0') dF{x\ 0) 

— 00 

(12.1.7) 

AiO, O') = 

% 

S(x;6')dF(x;6) 

- 00 

(12.1.8) 

B\0, O') =J 

f” lSix;0')fdFix;0) 

— 00 

(12.1.9) 

m O') =J 

1*°° S'ix; O') dFix; 0), 

— 00 


where 6 is any point in Qq, and (0, 0') any point in the Cartesian product 
set flo X 

First, we consider S(x; 0) and S'(«; 0). Assuming that F(x; 0) has a 
first derivative with respect to 0 for any point 0 in Qq and for all x, except 
possibly for a set of probability zero, it is evident that S(x; 0) is defined as 


( 12 . 1 . 10 ) 


Six'; 0) = lim 


F(x^; 0) - F{x; 0) 


[Fix' ; 0) - F(x; 0)] 
where x < x\ provided the indicated limit exists. We shall assume the 
limit exists for all 0 in and all x in and that it is nonzero on a set 
of values of x of positive probability. S^x; 0) is similarly defined. If 
for 0 = 0' a; is a random variable which has c.d.f. F(x; 0'), then S(x;0) 
and SXx; 0) are random variables for 0 e Qq- 

Note that if a; is a discrete random variable having p.f. p{x; 0), then 


(12.1.10a) 


Six; ®) = log ®). 


and if a? is a continuous random variable having p.d.f./(a;; 0) 
(12.1.106) Six; 0)^^ log fix;e). 


Similar statements hold for S'ix; d) in the cases of discrete and continuous 
random variables. 

Next, let us consider Hi6, O'). Let the a;-axis be divided into disjoint 
intervals (*,, a = • • • —1,0, +1,... and let 7. denote the interval 
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(.^a> *a+i]' Then 

P(x e 4 I 6) = /•(*.+,; 6) - F{x^-, d) 

with a similar meaning for P{x e 4 | A')- Let A = max {length 4} and put 

a 

CX) 

(12.1.11) H^{d,d’)= 2 iogP(xeI,\d')-P(xeI,\d) 

a= — QO 

= log (n [p(* 6 h 10')r‘* 

Va= — oo / 

It is seen that every term in the upper line of (12.1.11) is negative, and hence 
H^(0, O') is negative. In order to avoid having terms equal to — oo it is 
sufficient to assume that there exists no set E in the a:-space for which 
P(x e E\0) and P(x e E \ O') are not either both zero or both positive. 
Two distributions having c.d.f.’s F{x\ 0) and F{x\ O') satisfying this 
condition are said to be absolutely continuous with respect to each other. 
This assumption is a little stronger than is required here since it would be 
sufficient for F(x\ O') to be absolutely continuous with respect to F{x\ 0), 
that is, if no set E exists for which P{x e £* | O') = 0 and P(x e £ | 0) > 0, but 
this generality is offset by the symmetry of F{x\0) and F{x\ O') in this respect. 

Making use of the fact that if Pi, • • • ,Pr and qi, • • • are any two 
sets of positive numbers 

(12.1.12) Pj- • • < (Pi + • - 4- 

it will be seen that the positive quantity in {} in (12.1.11) cannot increase 
with successive subdivisions of the intervals in {4}. Hence if there is a 
set {4} such that //^(O, O') is finite, then the finite negative quantity 
cannot decrease as A 0, and hence lim //^(O, O') exists at all 

points (0, 0') in IIq ^ and is nonpositive. The limit is denoted by the 
integral in (12.1.6). N(d, O') is 21 log p(x; 0')p{x; 0) in the case of a discrete 

^oo 

random variable with p.f. p(x; 0), and I log/(a;; 0')f(x; O') dx in the case 

J — 00 

of a continuous random variable with p.d.f. f{x; 0). 

The function //(0, 0') is of basic importance in connection with informa¬ 
tion theory, entropy in statistical mechanics, and optimum estimation of 
parameters, and optimum statistical tests in statistical inference. 

If, for a given function g{x\ 0'), measurable with respect to F{x\ 0), 
there exists a non-negative function A(x), measurable and having finite 
mean value with respect to F{x\ 0) and such that 

\g{x\ 0')\ < h{x) 

for (0, 0') 6 Qq X Qq, we shall say that g(x. O') is dominated by the inte- 
grable function h{x). 

In order for (12.1.2) to be valid, it is sufficient for 3/30' log dF{:x\ O') 
to be dominated by some integrable function h^(x). Similarly, (12.1.3) 
is valid if d^jdO'^ log dF{x\ O') and [3/30' log dF{x\ 0')]* are dominated by 
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integrable functions h^x) and h^x). For general theorems on the sufficiency 
of conditions such as these the reader is referred to books on integration 
and real variable analysis such as those by McShane (1944), McShane and 
Botts (1959) and Saks (1937). 

By expressing (12.1.2) and (12.1.3) in terms of mean values, we shall say 
that F{x; 0) is regular with respect to its first 0-derivative in Qq if 

(12.1.2a) <y(S(*; 0)) = ^ f" dF{x-, 0) = 0, 

d\J J—00 

and regular with respect to its second 0-derivative in flo if B\0, 0) < + oo and 

(12.1.3a) AS'(*; 0)) + A^(*; 0))® “ ^ “ °- 

We return for a moment to H(0,0') as defined in (12.1.6). If F(x; 0) is 
regular with respect to its first two 0-derivatives, it can be shown that 
H(0, 0') has first partial derivatives with respect to 0 and 0' over Qo x 
Furthermore, H(0, 0'), as a function of 0’ for fixed 0, has a maximum for 
fl' - 6 . 


(b) Case of Vector Parameters 

Now suppose F(x; 0) is a c.d.f. where 0 is an r-dimensional parameter 
with (functionally independent) components (0^,..., 0,), the parameter 
space being convenience, we leave x one-dimensional; trivial 

modifications of notation show that results hold if x is k-dimensional. 
Sufficient conditions under which we can differentiate 



under the integral sign one or more times with respect to one or more of the 
components of 0 can be developed without additional difficulties. Con¬ 
sidering only two typical differentiations, we obtain corresponding to 
(12.1.2) and (12.1.3), the following: 


(12.1.13) ^ ») =/_* log ®)] ®) = 0. 


'V 

a* 


30 , 30 , 


/: 


dF(x; 0) 




3 




(12.1.14) 
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We are interested in the validity of (12.1.13) and (12.1.14) for p, 
q = 1, .... r and for all points in an r>dimensional (Euclidean) open 
interval £2^. 

For discussion of validity of (12.1.13) and (12.1.14), and for later 
reference, we write down expressions corresponding to (12.1.4) through 
(12.1.9) as follows: 

(12.1.15) S„(a!;0) = J-logdF(a:;0) 


(12.1.16) 

(12.1.17) 

(12.1.18) 

(12.1.19) 

( 12 . 1 . 20 ) 


SJ.X- 0 ) = 


a* 




log dF{x-, 0) 


log dF(x; 0') dF{x; 0) 


H(0, 0') = [* 

•/—a 

= r S,(*;0')dF(x;0) 

J -00 




0 ') = 


S,(x; 0')S,(x; 0') dF{x; 0) 


S„(*; 0') dF(x; 0), 


where /?, 9 = 1 ,.. . , r, 0 is a point in and ( 0 , 0 ') a point in x 
Since the components of 0 are assumed to be functionally independent, 
the components of the random variable 0 ),/? = 1 ,..., r) are 

linearly independent and hence the matrix 0 ') 1 | is positive definite 

for ( 0 , 0 ') in Q,.o x fhe light of our discussion for the case of a 

one-dimensional parameter, these functions require no additional comment. 

It is evident from the case of a one-dimensional parameter that for 
(12.1.13) to hold, it is sufficient for (d/dOp) log dF{x; O') to be dominated 
by integrable functions hij,(x), p = 1 ,..., r. Similarly, for (12.1.14) to 
hold it is sufficient for 

(d^lde^de^,) log dF{x; O') 
and 

[0/30;) log dF(x; 0')][O/a0;) log dFix; 0')] 

to be dominated by integrable functions ^ 2 p« 0 ) ? = 1 ,..., r 

respectively. 

Expressing (12.1.13) and (12.1.14) in terms of mean values, we shall say 
that F(x; 0) is regular with respect to its first partial d-derivatives in if 

^(S,(x; 0)) = ^ f* dF(x; 0) = 0, 

OU^ -QC 


(12.1.13fl) 



350 


MATHEMATICAL STATISTICS 


/> «= 1,..., r and regular with respect to its second partial d-derivatives in 
if 0)11 is finite and if 

(12.1.14n) d’(S,(x;e)S,(a:;0)) + d’(S„(x;0)) = -^ f“ dF(a:;0) = O. 

OUp UUg •/ - 00 

(c) Remarks Concerning Extension to Vector Random Variables 

preceding discussion relates to the case of a one-dimensional 
randoin variable. The discussion extends to the case of a A:-dimensional 
randdik variable with minor changes. The main changes lie in the defini- 
tiod^l cf S(x; 0) and H(d, O') if a; is a vector .. ., Xj^), In this case the 
sim|)le difference [F{x'\ 0) — F(x; 0)] of F(x; 0) over the interval {x, x'] 
which occurs in (12.1.10) is replaced by the difference of F{x-^,... ,Xj^;0) 
over the fc-dimensional interval (a?i,..., ..., Xj^], The resulting 

limits which is assumed to exist, defines {djdO) log dF{x^, •. •, a?*; ®) and 
is denot^by S{x^,,Xf^; 6). 

Similarly, the difference F(x^^i; 0) — F(x^; 6) of F(x; 6) over (x^^, x^^^] 
which appears in (12.1.11) is replaced by the A:th difference of 
F(xi, . . . , fl) over the fc-dimensional interval (xi ^,..., Xj^ ^; • • > 

®*.a+i] whereas A = max {A^}, where A^^ is the largest dimension of this 

a 

J<;-dimensional interval. 

The remaining changes in notation are straightforward. 

12.2 POINT ESTIMATION 


(a) Definitions 

Suppose (xi, ..., is an n>dimensional random variable from a c.d.f. 
FJpci, 0), where 0 is a one-dimensional real parameter with par¬ 

ameter space O. 

Let §(xi, .... x„), or more briefly 8, be a function of (x^,... ,x„) 
where 8 itself is a random variable. If the realized (observed) value of 8 
corresponding to a realized (observed) value of (x^, ..., a:J is used for 0o, 
the true value of 6, then the random variable 8 is called a point estimate or 
estimator for 0^. This use of 8 normally would be made, of course, only 
when the value of 0^ is unknown. If, when 0 = 6^, ^(8) = 0^, which we 
may write more briefly as ^(8 1 0o) = 0o, then 8 is called an unbiased 
estimator for 6^. Actually, it would be more accurate to say that 8 is an 
estimator for 0q unbiased in the mean. If 8 were a statistic having a c.d.f. 
W(8’, 0^ continuous at 0 = Og, such that 0o) = i, we would say that 
0 is an estimator for 0o unbiased in the median. Unless otherwise indicated, 
however, an unbiased estimator will be understood as being unbiased in 
the mean. 
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If an estimator 6 converges in probability to 0o as « oo it is called a 
consistent estimator for Oq. 

If 6 is an unbiased estimator for 0^ having finite variance, and has the 
further property that no other unbiased estimator has a smaller variance, 
then 0 is called an efficient estimator for 

If 0 is a statistic such that for any other statistic 0 the distribution of the 
conditional random variable 0 | 0 does not depend on 0 q, then 0 is called a 
sufficient statistic for 0o. If also <^'(0 | 0^,) = 0^ we shall say that 0 is a 
sufficient estimator for Oq. 

The concepts of consistency, efficiency, and sufficiency are due to 
Fisher (1922, 1925b). 

For simple random sampling, that is, where (.r,,.. ., x^) is a random 
sample of size n from a c.d.f. F{x; 0), these notions of unbiasedness, 
consistency, sufficiency, and efficiency are of special importance. In this 


( 12 . 2 . 1 ) 


s^-l 


For a given sample (.^i,. .., x„), the quantity dF^ = JJ dF(x^ \ 0) is called 
the likelihood element of 0 for (x ^,..., x„), 

Most of the material in this chapter will be concerned with simple 
random sampling. 

(b) Lower Bound of Variance of an Estimator 

Let (x^,,., ,x^) be an /i-dimensional random variable having c.d.f. 

0), If 0(xj ^,..., a:„) is an unbiased estimator for 0, then 

we have 


( 12 . 2 . 2 ) 


0)dF^ = 0 . 


If FJx^,,.. ,x^; 0) is regular with respect to its first 0-derivative in 
some (open) interval Qq containing 0o, the true value of 0, we may differen¬ 
tiate (12.2.2) under the integral sign and obtain 

(12.2.3) f (0 - 0)S,(a;i,. . . , a;,; 0) dF, = 1 

•ilin 

where 


(12.2.4) 


^n(^l» • • • > ^n» — 


d log dF^ 


Applying the Schwarz inequality to (12.2.3), we obtain 

l = rf (e-0)S„dF„l*< f (0-0)UF„- f 

LJjj. J •'i?. 
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Since 

(12.2.5) A6) = ( (9-dfdF„ 

J Itn 

we therefore obtain 

( 12 . 2 . 6 ) for 60 eCl„ 

®(S5 

the equality holding if and only if 

(12.2.7) K[9(x„ ...,*„)- 0] = S„{x„ ...,x„; 6) 

in Rn with probability 1, where K depends possibly on 6 but not on 

(a^i,..., 

If equality holds in (12.2.6), we say that 6 is an efficient estimator for 6, 
in which case (12.2.7) gives the form which 0 takes. 

If (^{5) is denoted by (t\6 | 6q) when evaluated for 6 = 6q, that is, by 
putting 0 = 6 q in (12.2.5), with a similar meaning for \ ©g), then 

a\d 1 0o) > l/[^(52 1 0g)]. 

An efficient estimator of 0 is usually denoted by 6, 

Summarizing, we have the following result: 

12.2.1 Suppose (a?!,. . ., a:„) is a random variable having c.d.f. , 

Xnl 0) where 6 is one-dimensional^ and where , x^; 0) is 

regular in its first B-derivativedn lig- • • • > unbiased 

estimator for 0o, then o\0 | 0o) > l/[^(5^ | 0o)] where is given 
by (12.2.4), the equality holding if and only if (12.2.7) holds with 
probability \ at B ^ Oq, If an efficient estimator 6 exists for 0q 
its variance is l/A*S'n \ %)• 

Lower bounds for a\B | 0o) without the regularity assumption have 
been given by Chapman and Robbins (1951) and by Kiefer (1952). 

If an efficient estimator 6 exists for 0o, and if 0 is any other unbiased 
estimator for 00. then the efficiency of 5 for estimating 0 q, is defined by 
ratio 

( 12 . 2 . 8 ) 

(c) Case of Random Sampling 
In this important case (12.2.1) holds and 

S„(xi, 0) = 2 S(xi; 0) 


(12.2.9) 
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( 12 . 2 . 10 ) 


S(^;0) = ^log dFixiO). 


2 S(x^; 6) is called the score for d based on . . ., x^. 
Furthermore 

(12.2.11) = nS{S^) = nB\0, 0) 

where d) is given in (12.1.8). Therefore (12.2.6) specializes to 


( 12 . 2 . 12 ) 


a\B) > 


nB\0, 0) 


This result was originally stated by Fisher (1922). It was later established 
by Cramer (1946), Dugue (1937), and Rao (1945). 

Equality in (12.2.12) holds if and only if 


(12.2.13) 


KiO - 0) = 2 S(^^; d) 

S^ = l 


over R„ with probability 1 where K does not depend on {x ^,. .., a;J. 
Thus, if an efficient estimator 6 exists, it is the statistic 0 which satisfies 

(12.2.13) and its variance is given by 

(12.2.14) (j\6) = —- . 

nB\e, Q) 

Thus, we have the following important corollary of 12.2.1: 


12.2.1a If {xi ,. . . , (r„) is a sample from the c.d.f F{x\ d) which is regular 
with respect to its first 0-derivative in f ^ unbiased 

estimator for Oq, then d^{0 | 0^ > XfinB^iO^, 0^)] where B\d, 0) is 
given by (12.1.8). Furthermore^ equality holds if and only if 0 
satisfies (12.2.13), in which case the solution for 0, denoted by 6, 
is an estimator for Oq with variance \l[nB\0Q, 6 q)]. 

If 0 is an arbitrary unbiased estimator for 0, and if an efficient estimator 
l5 exists for 0, the efficiency of 0 in estimating Oq is defined by 


(12.2.15) 


eff (0 I 0 o) = 


I 9o) 

a *(0 1 %) 


_ 1 _ 

(T*(vn01 So) ■ 


Fisher (1922) has called 6o), the reciprocal of the variance of an 

efficient estimator 6, the amount of information contained in the sample 
regarding Oq. or B\dQ, 6„) the amount of information about Oq per obser¬ 
vation from F(x; d„). 

From this point of view the efficiency of an unbiased estimator § for 
9 may be regarded as the fraction of information contained in S for 
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estimating 6q relative to that contained in an efficient estimator for 
estimating Oq. 


Example. Suppose ..., is a sample from the Poisson distribution 
PoiiA^ referred to in 8.3.3c. We have 


8 ” n 

SniH> • • • . /O = -5- log n dF(XC, /i) = - 

d/i ft 

n 

which, according to (12.2.13), shows that (I// 1 ) that is, the sample mean, is 

an efficient estimator for //q, the truevalueof /i, for any given sample size. Denoting 
1 " 

- by . .., or //, we find the variance of fi by applying (12.2.14), 
n 

that is, 



Also, we have 



a\\/nfi 1 /io) = 


and hence it follows from 12.2.1a that it is impossible to find an unbiased 
estimator . .., a7„) for for which a\Vnfi | juq) < jn^. 


(d) Lower Bound of Variance of a Biased Estimator 

A generalization of 12.2.1 (see Cramer (1946), Dugue (1937) and Rao 
(1945)) can be obtained by considering a biased estimator 0 for 0 as 
follows. We replace (0 — 0) in (12.2.2) by 

(12.2.16) [5 - 0 - b„(0)] 

where is the bias of the estimator 0 and repeat essentially the same 
argument as that involved in 12.2.1a. Then in (12.2.3) we would replace 
(5 — 0) by (0 — 0 — bfXO)) and 1 on the right by 1 + b'^iO), where 
b'Jfi) = djdO bn(0). In place of (12.2.12) we obtain 

(12.2.17) Ad) > 

^ nB\e, 0) 

with the equality holding if and only if 

(12.2.18) K[5-d- b„(d)'\ = I 0) 

over Rn with probability 1, and where K does not depend on (x ^,. .., a:„). 


(e) Properties of Sufficient Estimators 

Suppose (a?!,..., a:„) is a random variable with p.d.f. . .., 0) 

which factors as follows; 

(12.2.19) Ux„ ...,*„;») = vi9; e)uix„ ..., x„ j O) 
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where v(8; 6) is the p.d.f. of S, and u(Xi, ...,x„\5) is, the p.d.f. of the 
conditional random variable (xi, 10) which does not depend on 6. 

Then ^ is a sufficient statistic for 6. For if 6 is any other statistic which 
does not depend on §, the distribution of the conditional random variable 
0 I 0 is completely determined from u(xi,... ,x„\S). 

Conversely, suppose @ is a sufficient statistic with p.d.f. v(8; 6). Let 
^ 2 . • • • j be any « — 1 further statistics such that 9, , y„ have 

a p.d.f. as follows: 

(12.2.20) gi9, yi,...,yn\9)= f„ixi, ...,x„\9)^. 


where J is the Jacobian of (0, 2 / 2 * • • • > Vr) with respect to , x^. 

[For conditions under which (12.2.20) hold, see Section 2A{d),] 


v0; 6) 


Let hiy^. 

. ,yn\9-. 

0) be the p.c 

2 / 2 . • • •. 1 

0. Then 


(12.2.21) 

hiy^,.. 

.,y„\9-,9) 

where v{9 ; i 

9)f:0 and 

is given by 

(12.2.22) 

v{9-, 0) = 

1 g0, ^2. 




If %2. • 

• •. y™ 1 does not 

denote it by /i*( 2 / 2 s • . • 

, 1 0), thei 

(12.2.23) 

gi.^, ^2. • ■ 

. ,y„; 6) = 


Using (12.2.20), we therefore have 

(12.2.24) f„(x„ ...,x„;9) = v(5; 9)h*{y ^,..., | 0) • \J\. 

Note that h*(y 2 ,. . ., 2 /n 1 ^) ' 1*^1 depends on (0, 2 / 2 , • • •» 2 /n)» and hence 
on (^ 1 ,..., xj, but not on 0. Thus any statistic depending on {x^,,,, ,x„) 
through ( 2 / 2 ,..., 2 /n) but not on 0 would have a distribution which would 
not depend on 0. 

We may therefore summarize as follows; 


12.2.2 Let (x^,,,, ,x^ be a random variable having p.d. /. • • • > 5 ®)* 

A necessary and sufficient condition for a statistic 0(xi, ..., a;„) 
to be suffic''*nt for 0 is that 

(12.2.25) Ux, .0) = viS; e)w(x^, ...,x„) 

where v(d; 0) is the p.d.f. of S and iv(xi ,... ,x„) is a function of 
(xj,..., a; J which does not depend on 0. 
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Fisher (1922) first pointed out the sufficiency of the factorability criterion 

(12.2.25) . Neyman (1935) showed it is also a necessary condition. 

In the special but important case of simple random sampling where 
(a?!,..., is a sample from a p.d.f. f{x\ 0), we would have a corollary 

n 

of 12.2.2 in which . • •, takes the special form ri/(a^^; 0). 

In a similar manner, one can show that if (a^i,..., a;,^) is an n-dimensional 
discrete random variable with p.d.f. • • ., ®)> a necessary and 

sufficient condition for a statistic 6 to be sufficient for 0 is that Pn(^i» • • •» 
0) factor as follows: 

(12.2.26) p„(Xi, 6) = v(S; 6) wix^, 

where v(B; 6) is the p.f. of S and where ... ,x„) depends on (x ^,..., 
Xn) but not on 0. 

For simple random sampling from a p.f. p(x; 0), Pn(xj^, • • • > 

n 

would take the form ®)* 

f-i 

Finally, we remark that 12.2.2 can be extended to the case where 6 
is a vector (Oj,,.., 0^.), r < n. A necessary and sufficient condition for 
0i >..., 0,) to be a set of sufficient statistics for (0i,..., 0,.), ^ > r, is that 

(12.2.27) 

..., a;„; 01 ,..., 0,) = 0^.0r)M'(a:i. x„) 

where w{x -^,..., a; J depends on (a?!,..., a; J but not on 0. 

A similar factora<;jlity criterion holds, of course, for the case in which 0 
is a vector (0i,..., 0,.) and [x ^,..., a:„) is a discrete random variable 
having p.f. p„(x.^, 0i,..., 0,). 

For a random variable (a^i,..., a; J having a general distribution, a 
generalization of the factorability criterion stated in 12.2.2 together with 
its extension to the case of a vector parameter has been given by Halmos 
and Savage (1949) by making use of the Radon-Nikodym theorem. An 
even more abstract treatment of necessary and sufficient conditions for the 
characterization of sufficient statistics has been given by Bahadur 
(1954). 


Example. As an example of a sufficient estimator suppose ..., a; J is a 
sample from the Poisson distribution We have for arbitrary on (0, «»), 


dF(x\ fx) 


xl 


n 


U dFiXf X n) 






and 
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which can be written as 


Zxt 

/ \ 

* 

(?'<)' 





the p.f. of fi, that is, 


dViii.fi) 


fjnt^g-nn 


whereas the expression in { } is actually the p.f. of the conditional random 
variable .. ., | /2), that is, 

. 

It is evident that if ft* is any other estimator, then the p.f. p(fi* | fi) of the 
conditional random variable is obtained by summing over 

those positive integral (or zero) values of the subject to the two conditions 
fi*(x^, ..., a;„) = /I* and fi(xi,.,. ^ x^) = fi. It is seen that p(fi* | fi) does not 
depend on jn. Hence, (IIn) 2 is sufficient for estimating 


An important property of a sufficient estimator is that if one starts with 
any initial unbiased estimator of the parameter that is not a function of 
the sufficient estimator one can find an unbiased estimator depending on 
the sufficient estimator which has a smaller variance than that of the initial 
estimator. More precisely the situation is stated in the following theorem 
due to Blackwell (1947) and Rao (1945): 

12.2.3 Suppose 0 is a sufficient statistic for Oq ® w any unbiased 
estimator for Oq- Let | B) = h{^). Then h{0) is an unbiased 
estimator for 6^ whose variance cannot exceed that of 9. 

To prove 12.2.3 let G(§, Q ; flo) be the c.d.f. of (0, 0), F(@; 0^) the c.d.f. 
of Sy and U0 | S) the c.d.f. of the conditional random variable S ] S. Then 

(12.2.28) h(S)^fedU(d\e), 

J-00 

thus we see that h(S) is an unbiased estimator for 6^. 

For the variance of 5 we have 

o*(?) = - flo)* = mS) -e^ + (jS- A(5))]* 

= o«(A(0)) + ^(9 - A(0))» + 2^m9) - 9^9 - A(5))1. 
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But 

^mS) - e,XB - h(8))-] 

=J J/i(0) - e„]{ - hiS)) dU(d I §)j dV(5-, So) = 0. 

since the quantity in { } vanishes. Therefore, since ^(8 — h(8yf > 0, we 
have 

(12.2.29) a\8) > a\h0)) 

which completes the argument for 12.2.3. 

12.3 POINT ESTIMATION FROM LARGE SAMPLES 

(a) Asymptotic Distribution of the Score 

An efficient estimator of a parameter exists for samples of size n, only 
for certain special cases of c.d.f.’s F{x ; 0). In such cases, ... ,x„;d), 
which is given by (12.2.9), takes on the special form given in (12.2.13), and 
the efficient estimator ... ,xj is therefore essentially given by 
solving the equation 

(12.3.1) 5„(a;i,...,a;„;0) = O 

for 6. 

But suppose 0) does not have the special form indicated 

by (12.2.13) but that (12.3.1) does have a solution for 0, which we can, 
without ambiguity, continue to call ^„(a;i,..., x„). What properties does 
this solution have as an estimator for 0^? The answer is that under certain 
conditions to be developed presently, 8„(xi, ...,*„) is an efficient estimator 
for dg in an asymptotic sense for large samples. 

To deal with this question we first establish the following result: 

12.3.1 Suppose (* 1 ,..., x„) is a sample from a c.d.f. F{x; 6g), where 
F(x; 0) is regular with respect to its first d-derivative in Qq. Then 
if BHd; 0), as defined in (12.1.8), exists and is finite, 

6g) is asymptotically distributed according to N(0, nB\Bg, 0^)). 

This is essentially a corollary or 9.2.1. For if we denote the random 
variable S(x; 0) by y, then y has a c.d.f. G(y) defined by 

G(y) = f dF(x;ff) 

where Ey is the set of points on the a;-axis for which S^(a;; 0) < y. Since 
F(x; 0) is regular with respect to its first 0-derivative, it is clear that 
^(y) S’ 0. Thus 0) is the sum of a sample of size n from 

a population having zero mean and finite variance BHB, 0). Applying 

9.2.1 we obtain the conclusion of 12.3.1. 
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(b) Convergence of Maximum Likelihood Estimators 

It is clear that under the conditions stated in 12.3.1 the random variable 
{lln)Sn(xi,,, . ,x^\0) converges in probability to 0. This suggests that if 
we set 

(12.3.2) S,Xx,,..,,x,;e)^0 

we should obtain a sequence of sets of roots , icj}, n = 

1, 2,. . ., each set containing at least one root, from which a sequence 
can be found which converges to the true value of d with probability 1. 
We shall show this is true under certain conditions. Let us assume that 
0) is a continuous function of 0 in for all values of x in Ri except 
possibly for a set of probability zero. F'urthermore, we shall assume that 
F(x; 0) is regular with respect to its first 0-derivative in f**^>*^ which it 
follows that A{0q, 0) as defined in (12.1.7) is continuous and strictly 
decreasing in 0 over some subinterval of Qo which contains Oq, 

Referring to (12.2.9) and the proof of 12.3.1, it is evident that 
(\In)Sn(xj^, . . ., 0) is the mean of a sample of size n from a population 

having mean A(0^^,0) if Oq is the true value of 0. Therefore by 4.6.1, 
(l/«)5„(a:j,.. . ., 0) converges almost certainly to A{0q, 0). Without 

loss of generality, we may take 12^ to be (Oq — 6, 0„ -f- ^) where (5 > 0. 
Thus A(6^^, 0) is monotonically decreasing over this interval and since 
-^(®o’ ®o) == 0 we have A(0(^, 0„ — d) > 0 and A(0q, 0^, -f <5) < 0. There¬ 
fore there exists an /7(d, s) so that the probability exceeds 1 — e that both 
of the following inequalities hold for all n > n(b, e) if 00 is the true value 
of 0: 

if e = 0 „-d 

(12.3.3) ” 

- S„(x„ ...,x„;6)<0 if 6 = d„ + d. 
n 

Since S(x, 0) is continuous in 0 over (Oq — <5, 0o + for all x in Ri except 
for a set of probability 0, a similar statement holds for 

“ 2 ‘^(^^ that is, - SJ^x ^,.. ., 0). 

n 5=1 n 

Therefore, if Oq is the true value of 0, we have 

(12.3.4) 

p(~ S,i(a?i,. . ., 0) = 0, 

\n 

for some 0 in (0o ± S) for all n > n(d, e) | 0o j > 1 — e 
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which is equivalent to the statement that a sequence of roots of (12.3.2) 
exists which converge almost certainly to Oq. In particular, if (12.3.2) has 
a unique solution ..., for « = + 1,..., for some integer 

Hq, then the sequence 6(x ^,..., xj, n == no> ”o + 1,. • • converges almost 
certainly to Oq. 

Summarizing: 

12.3.2 Suppose {x ^,..., is a sample from the c,d.f F{x\ 0q), where 
F(x; 6) is regular with respect to its first d-derivative in Qq. Let 
S(x; 6) be continuous in 6 for all values of x in except possibly 
for a set of probability zero. Then there exists a sequence of 
solutions of (12.3.2) which converge almost certainly to In 
particular, if (12.3.2) has a unique solution 6(xi ,... ,x„)forn > 
some nQ, then the sequence 6(xj ^,..., a?„), n = n^, Wq + 1,.. . 
converges almost certainly to Oq. 

In view of the fact that S^ixi ,..., 0) is the first 0-derivative of the 

n 

logarithm of the likelihood element, namely, ^ log dF(x^; d), and that 

^(*1.*n) is a value of 0 for which S^ix^,... ,x,^; 6) vanishes, 6, if it 

is unique and maximizes the likelihood element, is called the maximum 
likelihood estimator for 6^, a term introduced by Fisher (1922). Wald 
(19496) has shown that the solution of (12.3.2) which maximizes the 
likelihood is, under certain conditions, a consistent estimator for Oq. Other 
detailed analyses of the consistency of maximum likelihood estimators and 
related problems have been made by Barankin and Gurland (1951), 
Huzurbazar (1948), LeCam (1956), Wald (1948, 19496), and others. 

(c) Asymptotic Distribution of Maximum Likelihood Estimators 

The assumption of regularity of F(x; 0) with respect to its second 
0-derivative is strong enough to enable us to make the following statement 
about the asymptotic distribution of 6 for large n: 

12.3.3 If (xj^y ..., is a sample from the c.d.f. F{x\ 0q), where F{x\ 0) 

is regular with respect to its second d-derivative in Qq if 
maximum likelihood estimator 6 for 0^ is unique for n > some Wq, 

md a random variable {measurable) with respect to JJ F{Xf\ 6), 

t-i 

as defined in (12.2.1), its distribution is asymptotically normal 
N{d^\ l/[nJ?*(0o, G^\lfor large n. 

Since S{x; $) has a 6-deiivative everywhere in O, and for all points x 
in /?! except possibly for a set of zero probability, a similar ^-derivative 
statement holds for Sj{xi,... ,x„;$) for all points (x^,... ,xj in Jt„, 
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except possibly for a set of zero probability. If 6 is unique for n > some 
Hg, we have seen from 12.3.2 that 6 converges almost certainly to dg as 
n-*• 00 . Furthermore, if 6(xi,... ,x„) is a random variable, then for 
arbitrary d > 0 and e > 0 there is an n(d, e, «g) and a set E„ in R„ defined 
by |0o “ • • • > *n)l < ^ that 

(12.3.5) Piixi,. . .,x„)eE„ for all n > n(d, e, «o) | flg) > 1 — e. 

In£„ 

(12.3.6) —j= S„(xi ,. . ., 6g) 

v« 

= -L S„(x„ ...,X„;0) + -^ s;(xi.e*)(flg - 6) 

yjn yjn 

where , a;J is a random variable satisfying 

(12.3.7) lOg - 0*1 < |0g - 6\ 
and where 

(12.3.8) . . ., 0*) = I SXx^; 6*) 

and 


(12.3.9) sXx-d) = ^S{x-,e). 

ou 

But (12.3.5) and (12.3.6) together are equivalent to the statement 
(12.3 10) 

p(4=Sn(a^i,..., 0o) = ”7= • • •, ..., a;„; 0*) 

\yjn yjn n 

• [y/n(dQ — ^)] for all n > n{dy e, Uq) |0oj >1—6. 

Since 6, by definition, satisfies (12.3.2), we see that (12.3.10) reduces to 
(12.3.11) 

p(—=S„{Xi ,..., Og) = —S„(a;i,..., 0*)[.^n(6g 6)] 

V” " 

for all n > n(d, e, /ig) | 0g I > 1 — e, 

which implies that 


(12.3.12) 

and 


S„(irj,.. ., 3.^, Og), fi 1, 2, * * * 
y/n 


(12.3.13) i S'„(x^, ...,x„; O*)[V«(0g - 6)], n = 1, 2, 
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are sequences of random variables, which converge together in distribution 
if either converges in distribution. Now 

(12.3.14) i ..., 0o) = - i ^o) 

n n ^=1 

which is the mean of a sample of size n from a population having mean 

(12.3.15) f” S’iz; 0o) dFix; d^) = 6^) 

J -CO 

and hence by 9.1.1 the expression on the left of (12.3.14) converges in 
probability to the expression given in (12.3.15). Now it follows from 
4.3.7 that since 6 converges in probability to 6q, so does 0*. Therefore, 
by applying 4.3.8, we conclude that (1/w) ..., 6*) converges in 

probability to the expression on the right of (12.3.15). 

Finally, by applying 4.3.6 we conclude that the sequences of random 
variables given in (12.3.12) and (12.3.13) and the sequence 

(12.3.16) Vn(6 - eo)m, 0o). « = 1, 2,... 

converge together in distribution if any one of the three does. But we 
know from 12.3.1 that the sequence (12.3.12) converges in distribution to 

(12.3.17) MO,m,0o))- 
Therefore, for large «, the asymptotic distribution of 6 is 



thus establishing 12.3.3. The asymptotic distribution of 6 given by 

(12.3.18) was originally stated by Fisher (1922). 


(d) Asymptotic Efficiency of Maximum Likelihood Estimators 

Suppose ..., xj is a biased estimator for Bq with bias fen(®o) 
defined in (12.2.16). If S' is a consistent estimator for Oq its bias fe„(0o) 
converges to zero as « oo. 

For any given «, we have seen that the lower bound of | 6q) is 
given by {\12A1). From this, it follows that 


(12.3.19) 

a*(VJJ0 I Oo) = - e„ - fr„(0„)]“ \ 6 ^) > 


[1 + b'Mf 
BH. fio) 
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Since the inequality holds for every n, we have, assuming b'JBo) 0 as 
n —► 00 , 

(12.3.20) 

lim inf a\y/nO \ Oq) > lim 

n-*oo n-*cc 

that is, 

(12.3.21) lim inf <^{y/ne \ 6o) > ~ - • 

n-QO B Oq) 

Therefore we have the following result: 

12.3.5 Let (a ?!,. , , ^ x^) be a sample from the c.d.f. F{x\ Oq), where F{x\ 0) 
is regular with respect to its first 0-derivative and B\0, 0) exists for 
0 in ^ consistent estimator for 0^, and if the O-derivative 

of the bias of 0 has zero as its limit as n-> oo, the least upper 
bound of (T^(\/ n^) as n -> oo cannot be less than \IB%6 q, 6q). 

An important case of a consistent estimator for 0 is the maximum 
likelihood estimator 6 under the conditions of 12.3.3. Also we have from 
12.3.3 that the variance of the limiting distribution of Vn0 — 0o) is 


(l±±b;M\ ^ 1 

\ B\0,,0,) J B\0o,0o) 


(12.3.22) 


1 


Oo) 


If the variance of the limiting distribution of Vn{0 — Oq) is 
we shall say that the limiting or asymptotic efficiency of 0 
defined as 


(12.3.23) 

Thus 


lefT(0 1 Oo) = 


B*\0o. Op) 
B\0o. Oo) ■ 


Oo). 

as /t —► 00 is 


(12.3.24) 


lelT((5 I 0o) = 1. 


Any estimator 0 for Oq for which leff (0 | 0^,) = 1 is said to be an 
asymptotically efficient estimator for Oq, and this is true of 0. Hence, 

12.3.6 Under the assumptions of 12.3.3 the maximum likelihood estimator 
6 for 00 is consistent and asymptotically efficient. 

The reader should note the distinction between efficiency as defined by 
(12.2.15) and asymptotic efficiency which is defined by (12.3.23). The 
former definition holds for any w, but only when an efficient estimator 
exists. The latter is a property which is defined as a limit for « —► oo and 
holds when the asymptotically efficient estimator 6 exists, that is, under 
the assumptions of 12.3.3. The asymptotic efficiency of a consistent 
estimator 0 may also be viewed as the large-sample extension of Fisher’s 
notion of amount of information in 0 concerning Oq. 

Further detailed studies of asymptotic properties of maximum likelihood 
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estimators have been carried out by Bahadur (1958), Barankin and Gurland 
(1951), Kraft and LeCam (1956), Rao (1949), Wald (1948), and others. 

Example. To illustrate the asymptotic efficiency of an estimator, suppose 
(a?i,..., a:,j) is a sample from the normal distribution N(m, Let p. be the 
median and p the mean of the sample. It follows from 9.6.6 fox p that the 
asymptotic distribution of p for large n is N(ji, na^jln). It can be verified by the 
reader that an efficient estimator p for ii exists for any value of n and that p is the 
sample mean. Furthermore, we know from 8.3.3d that the distribution of the 
sample mean is the normal distribution N(ji, a^jn). Therefore applying (12.3.23) 
the asymptotic efficiency of the sample median p in estimating the true value 
of PL is given by 

-0.637. 

This means that for large n the mean of a sample of size (2/7r)/i will estimate the 
true value of the mean // of a normal distribution M/u, a*) with the same degree of 
precision approximately as the median of a sample of size n, no matter what the 
true values of pL and a* may be. 

(e) Function of a Maximum Likelihood Estimator Having Asymptotic 
Variance Xjn 

Referring to 12.3.3 it will be noted that under the regularity conditions 
placed on F{x\ 0) the maximum likelihood estimator 6 has as its asymp¬ 
totic variance for large n, the quantity 1 /[wB^(0o, 0o)] where Oq is the true 
value of 0. A question of considerable theoretical and practical interest 
is whether one can use as a new parameter ^ some function of 0, say f(0), 
such that the asymptotic variance of ^((J) is simply Xjn, A function of this 
kind can be found under certain conditions. To determine f(0) we proceed 
as follows. Let ^(0) be a function of 0 having a derivative ^'(^) and unique 
inverse 0(0 in Qq and let 0(0 have a derivative 0'(O with respect to 0 
Let ^0 he the true value of 0 that is, 0(^o) = %• Referring to (12.1.8) 
we may write 

(12.3.25) 0) = I * {S{x; d)f dF(x; 0). 

Since 

(12.3.26) S(x; 0) = — log dF(a:; 0) 

dO 

we may write 

(12.3.27) Six; 6(0) = ^ log dF(x; 6(0) • 0(9)- 

04 

Substituting in (12.3.25), we have 

(12.3.28) 

B\6(0, m - KWj* [| log dF(x; 6(0)^dF(x; 6(0). 
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Requiring the maximum likelihood estimator t, (for to have an asymp¬ 
totic variance of l/« for large «, is equivalent to requiring the integral on 
the right of (12.3.28) to have the value 1. Therefore, the function $(0) 
must satisfy the differential equation 


(12.3.29) 
or 

(12.3.30) 


B\da), 6(0) ■ 

aO) = f*B(0, 0) d0 + ^0. B(d,d) = + n/b®(0,0). 


Summarizing, we obtain the following result: 


12.3.7 //(xi ,..., xj IS a sample from the c.d.f. F{x\ 0^) where F{x\ 0) is 
regular with respect to its second 6-derivative, and if 6 is the 
maximum likelihood estimator for 0o, then where ^(0) is given 

by (12.3.30) and has a derivative ^'(®) ^ unique inverse 0(0 in 

is asymptotically distributed according to the normal distribution 
^(^ 0 . ^ In), for large n. 


Example. Suppose ..., is a sample from the Poisson distribution 
whose p.f. is 


p{^, /Mq)-1 


a; = 0, 1, .. . 


We wish to determine what function of //, say f(ya), is such that f(/i) has as its 
asymptotic distribution the normal distribution A^(J(/4o), 1/az) as w -► oo. We know 
from the example in Section 12.2(f) that the maximum likelihood estimator/2 for 
//(, is the sample mean x. It can be verified by applying formula (12.3.25) to the 
Poisson distribution (that is; where dF{x\ fi) = p{x\ /i)) that 

fi) = 2 0 /O = - • 

Applying (12.3.30) and choosing ?o = we find 

a/4) = ivj,. 

Therefore, we conclude from 12.3.7 that if (a’j,. . ., .r J is a sample from a 
population having the Poisson distribution Poi/n^, then 2V ^ has, as its asymptotic 
distribution for large n, the normal distribution A(2 V/yo, 1/w). 


12.4 INTERVAL ESTIMATION 
(a) Definition of Confidence Interval 

Suppose (a?!,..., xf) is a random variable havingc.d.f./*’„(a:i,. .., 0). 

Let 0(0?!,..., a;J, 6{x ^,.. ., ar„) be two functions (random variables) 
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of the sample elements such that 6 < 6. ltd and 6 can be chosen so that, 
for a given y 

(12.4.1) i>(0<0<9|0) = y 

where P(d < 0 < 0 | 0) denotes the indicated probability as determined 
from ..., 0), then (0, 0) is called a lOOy % confidence interval 

for 0, whereas 0 and 0 are called lower and upper confidence limits for 0, 
and y is called the confidence coefficient. Note that (0, 0) is a two- 
dimensional random variable such that the probability is y that the interval 
(0, 0) contains the true value of 0 in ..., 0). Examples of 

confidence intervals have already been given in Sections 10.2(c), 10.4,10.5, 
and 10.6, in the case of sampling from normal distributions. ‘^The mathe¬ 
matical formality of a confidence interval was first introduced by Laplace 
(1814) in dealing with the problem of inferring the value of p in the 
binomial distribution (6.2.2) from an observed value of the random 
variable x of the distribution. Laplace regarded the confidence interval as 
fixed and p as a random variable. Laplace’s procedure was rediscovered 
by Wilson (1927) who gave the correct interpretation of the interval as a 
random interval. The development of the modern theory and terminology 
of confidence intervals is due to Neyman (1937). 

(b) Procedure for Constructing Confidence Intervals from Samples from 
Continuous c.d.f.’s 

A procedure by which confidence intervals can be constructed in certain 
cases from a sample from a continuous c.d.f. F{x\ 0) can be stated as 
follows: 


12,4.1 Suppose (xi,... ^x^ is a sample from a continuous c.d.f. F{x\ 0). 
Suppose g(xi,..., 0): (i) is defined at every point 0 in an 

interval (0^, 02 ) containing 0o, and at every point in the sample 
space Rn, except possibly for a set of probability zero; (ii) is 
continuous and monotonically increasing or decreasing in 0; and 
(iii) has a c.d. f. that does not depend on 0. Let (^i, g<^ be an interval 
for which P{g^ < g < ^ 7- Then if 0o is the true value of 0, 

the solutions 0, 0, (where 6 < 0), of the equations g(x^, . .., 0) =s 

gi, g 2 (0, 6) is a lOOy % confidence interval for 6 q. 

To establish 12.4.1 note that since the c.d.f. of g(xi, ..., 0) does not 

depend on 0, we can, for a given 0 < y < 1 and a gi /en true value 0o, 
choose gi and gj (in many ways, of course) so that if 6q ij the true value of 0 

^gl < gip^v • • • . *n; ®o) < gi I ®o) = Y- 


(12.4.2) 
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Since gix^,... ,x„-, 6) is continuous and monotonically increasing (or 
decreasing) in 0 over 6^, it is evidei;. that (12.4.2) is equivalent to the 
statement 


(12.4.3) p(0<eo< e|0o) = y 

where 0 < 0 are the solutions of the equations • • •, ®) = ^i> ^2 

and, of course, are random variables. 

One may ask whether it is always possible to find a function .. . , 
0) whose c.d.f. is independent of 0, assuming that F(x; 0) is continuous 
in X, Such a function can always be found. In fact, the form taken by 

n 

• • • > J for simple random sampling, namely Yl is 

such a function if F(x; 0) is a continuous monotonically increasing or 
decreasing function in 0 for all points in the space of x, except possibly 
for a set of probability zero. For the random variables F{x^; 0), f = 
1, 2,. . .,« are independent and, as pointed out in Section 8.7^7, each is 
distributed according to the rectangular distribution /?(J, 1). Furthermore, 
—log F(x^; 0) has the gamma distribution G(l), and it follows from the 

n 

reproductive property of the gamma distribution that —^\ogF(x^;d) 
has the gamma distribution G{n). Therefore 


(12.4.4) 


P 



n 


^2 < — 2 *06 < “*08 b 

1=1 



1 

r(n) 


I 


-log bi 


log ftg 


y 


e ''dy. 


If bi and are chosen so that the integral on the right is y, then we have 


(12.4.5) p(bi < n < ^2 1 = y- 

Now if F(x ; 0) is continuous and monotonically increasing (or decreasing) 
in 0 for all points x in (except possibly for a set of zero probability), 

n 

then the statement holds for F(xf, 0). Therefore the inequalities in 
the probability statement (12.4.5) can be inverted and written in the form 

(12.4.6) P(0<0o<0l0o) = y 

thus providing a 100/% confidence interval for 0o. 

The reader should observe that if the assumption of monotonicity of 
g(xi ,..., 0) in 12.4.1 is removed, (12.4.2) is still valid, but the inversion 

of the inequalities in (12.4.2) yields a random set B in (0^, 02) where E 
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depends on (x ^,..., ar J, instead of the confidence interval (9, 6). Then 
(12.4.3) would be replaced by P(6 e E\d) ^ y. It should be carefully 
noted that what is random is E and not 0. From the viewpoint of the 
estimation of 0o* ^ random interval (fl, 6) is more appealing and more 
useful than a random set £. The extension of the idea of interval estima¬ 
tion of 00 to set estimation of 0o will be useful in Section 12.7 when we 
consider the problem of extending interval estimation to a multidimensional 
parameter 0. 

(c) Confidence Intervals from Samples from Discrete Distributions 

In case F{x\ 0) is the c.d.f. of a discrete random variable, then, of course, 

12.4.1 is not applicable; that is, one cannot find confidence intervals 
having confidence coefficients exactly equal to y, where y is arbitrary. 
However, under certain conditions one can find confidence intervals 
having confidence coefficients not less than y, although the situation is 
less elegant than that for the case of a continuous random variable. More 
specifically, 

12.4.2 Suppose (x^,,,. ,xj is a sample from the c.d, f F{x\ 0), where x 
is a discrete random variable and 6 is a parameter whose space 
is an interval (0i, 02). Let 0 be an estimator for 0 defined at every 
mass point in the sample space /?„ and lying in (Oj, O 2 ). Let K(0; 0) 
be the c.d.f. of 6 and V*(d; 0) = 1 — V{0; 6). Furthermore let 
V(0; 6) be continuous and decreasing in 0 at each mass point of 0 
so that Urn F(0; 0) = 1, lim V(0; 0) = 0 for all 0 g (0i, 6^. 

Let 6 and 0 be the values of 6 for which V{0; 0) = y^ and V*(6; 0) = 
yf, respectively where y^ and yf are non-negative and 0 < y = 
1 -- yi — yjf < 1. Then (0, 0) is a confidence interval for Og with 
confidence coefficient > y. 

To establish 12.4.2, let 9^ be the largest value of 0 for which K(0; 
®o) < ?! the smallest value of 0 for which K*(0; 0o) < yf. 

Then 

(12.4.7) F(0i < 0 < 02) > y 

where y == 1 — yi — yf. But since V{6; 0) is monotonically decreasing 
in 0 and nondecreasing in S and F*(0; 0) is monotonically increasing in 0 
and nonincreasing in S, it is evident that di < S < if and only if 

®o) > yi V*(S; 0o) > y*. that is, if and only if 0 < 0^ < fl. 

Therefore 


(12.4.8) 


i>(0 < 00 < 0 I 6o) > y 
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and hence (0, 0) is a confidence interval for Oq with confidence coefficient 

> y- 

Example. As an illustrative example, consider a sample ..., from a 
population having a binomial distribution Thep.f. of this distribution 

is 

=/>§(! 

the mass points being x = 0, 1, and is a number on (0,1). The sample mean 
X has the binomial distribution Bi{n^ /?o) over its sample space 0, 1 //f,.. . , njn, 
and is a consistent estimator for Pq. Furthermore, when the population being 
sampled has the binomial distribution Bi(l,p), the c.d.f. of x is defined by 

V{x \p) \ />*(! -pY^^ 

and V*{x; p) is defined by 

V*(x; /?) = 1 - V(x; p). 

Now V(x; p) is monotonically decreasing in p and V*(x; p) is monotonically 
increasing in /? for 0 < nx < p since {dldp)V{x\ p) <0 and idldp)V*(x; p) > 0. 
Furthermore V(x\ 0) = 1 and V{x\ 1) = 0 for all x 6 (0,1). Therefore, if we apply 
12.4.2 it is evident that if p and p are the solutions of V{x\ p^ = and 
V*(x; /7q) = yf respectively, then if Pq is the true value of p in the population, and 
if y SB 1 — have 

P(p <Po < p\Po) > V, 

that is ip^f) is a confidence interval for p^ with confidence coefficient > y. 

Confidence intervals of this type for p have been constructed, presented 
graphically and published by Clopper and Pearson (1934) for the case when 
yj s= y♦ = 0.05, 0.025, and for various values of n ranging from 10 to 1000. 

A procedure similar to that used in the preceding Example has been 
applied by Garwood (1936) and Ricker (1937), to the problem of deter¬ 
mining confidence intervals for the parameter ^ in a Poisson distribution. 

The procedure stated in 12.4.2 can be extended to the case in which 
the parameter space consists of a discrete set of points or some other subset 
of the points in the interval (Sj, 0^, This situation arises, for example, 
in setting up confidence intervals for p in the case of a random variable 
X having the hypergeometric distribution w; /?) as defined in Section 
6.1(a), where N and n are known. In this case the parameter space £2 for p 
consists of discrete points 0, l/AT, IjN, ..., 1. Confidence intervals for p 
in this case have been computed by Chung and DeLury (1950) fovN = 500, 
2500, 10,000, fijN == 0.05, 0.10 (0.10) 0.90, and for y = 0.90, 0.95, 0.99. 

(d) Remarks Concerning Fiducial Probability and Fiducial Intervals 

A notion introduced by Fisher (1935b) and related to that of 
confidence intervals is the concept of a fiducial interval. Suppose the 
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parameter space of 9 is the interval (0i, Oj) and let 5 be a statistic whose 
sample space is (a, b). Let v(s', 0) be the p.d.f. and V{s; 0) the c.d.f. of s. 
If y*(0; s)— 1 — V(s; 0) is continuous and strictly monotonically increa* 
sing in 0 for each s in (a, b) so that lim V*(0; s) = 0, and lim V*(0; s) = 1, 

for each s in (a, b), then G(0; s) as a function of 0 on (0i, 0^ has all the 
formal properties of a c.d.f. for each s in (a, b) and is called the fiducial 
c.d.f. of 0, based on the statistic s. Thus, if 0' and 0", 0' < 0", are chosen 
so that K*(0'; j) = and ^*(0*; s) = y,. where y, — y^ = y, where 0 < 
y <•!, we may write the following^t/uciia/ probability statement: 

(12.4.9) fid P(0' < 0 < 0' I j) = y 

so that (0', 0") would be called a lOOy % fiducial interval for 0. If 0q is 
the true value of 0, and if we let s' and s" be the values of s satisfying the 
equations V(s; 0,,) = 1 — yj and K(j; 0 q) =» 1 — y^, respectively, where 
ys “ yi = y» it will be seen that jP(s' < s < s" j 0^) = y. But 0o lies in 
(0', 0*) if and only if s lies in (s', s"). Thus, (0', 0") is also a lOOy % 
confidence interval for 0^. 

Another procedure for generating a fiducial distribution and fiducial 
intervals is by means of pivotal functions. Thus, for the statistic s and 
parameter referred to above, suppose g(s; 0) is a strictly monotonically 
increasing function of 0 for each s and having a similar property as a func¬ 
tion of s for each 0, such that g(s; 0) has a probability element h(g) dg 
which does not depend on 0 except through g. If g{s\ 0) has first partial 
derivatives with respect to s and with respect to 0, then h(g')(dglds) ds is 
the probability element of the random variable s, and Hg\dgl00) d0 is 
the fiducial probability element of the parameter 0. Thus, let gi and gj, 
(gi < g^, be numbers such that 

(12.4.10) P(gi < g(s-, 0)< g* 10) = y. 

Let Si and ^2 (Si < s^, be values of s for which g(f; 0) = g^, gt respectively, 
whereas 0i and 02 , 0i < 02 , are values of 0 for which g(s; 0) =‘gi,g% 
respectively. Then 0 e (0i, 0^ if and only if £ e (r^, ^ 2 ). Therefore 

J%(g)||d0«J*h(g)|fds 

00 J$i Os 

where the left side is the fiducial probability fid P(0i <0 <0^ whereas 
the right side is the probability P(si <s <s^. A similar treatment can 
be given if g(s; 0) is strictly monotonically increasing in s but decreasing 
in 0 and vice versa, and when g(s; 0) is decrrasing in both variables. 

Rcmaik. In the two procedures mentioned above for generating fiducial 
distributions, fiducial intMvals and confidence intervals are equivalent. 
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There are some situations, however, when fiducial intervals are not equivalent 
to confidence intervals, and this has given rise to controversy as to what inter¬ 
pretation should be placed on a fiducial interval. This controversy centers on the 
meaning of the formal fiducial distribution of the difference — //g of means 
of two normal distributing af) and the statistics involved being 

the sample mean and sample variance of a sample from each distribution. More 
precisely, suppose Wj, ij, 5? and ^’ 2 * ^re the size, mean, and variance of 
independent samples from (r\) and respectively. Then 

(12.4.11) /g = -^^ 2 ) 1^2 

are independent Student ratios having /ij — 1 and Wg — 1 degrees of freedom 
respectively. If /mCO is the probability element of a Student ratio t having m 

degrees of freedom, the probability element of and /g is 
Using the quantities denoted by and /g in (12.4.11) as pivotal quantities, one 
obtains the fiducial probability element of (/z^, //g) for given Wg, 5| 

the following 

(12.4.12) - /'i)/-S'i)/,,j-i( v'«2(-'\ - /<2)A2)['^»i«2/vd dMi- 

From (12.4.12) one can formally find the fiducial probability distribution of 
(/(j — //g) and from this distribution fiducial intervals for (//j — ^tvg). But it can 
be shown that there exists no function ^(.r,, .vg, .vf, .v|,—//g), satisfying 
certain regularity conditions, whose distribution depends only on Wg, and^ 
from which one can obtain confidence intervals for /«i — /<g. Thus in this 
example fiducial intervals appear to be different from confidence intervals, and 
the question is how does one interpret fiducial intervals in this case? 

This is known as the Behrens-Fisher problem [See Fisher (1935/))]. The reader 
interested in further details concerning this problem should consult Tukey (1957c) 
and the references at the end of his paper. 


12.5 INTERVAL ESTIMATION FROM LARGE SAMPLES 


(a) Interval Estimates from Likelihood Estimating Function 


The asymptotic theory of interval estimation of population parameters 
from large samples follows fairly directly from the theory of point esti¬ 
mation from large samples as presented in Section 12.3. Asymptotically 
efficient point estimation implies, as we shall see, asymptotically shortest 
interval estimation in the case of large samples. 

Suppose , x„) is a sample from a distribution having c.d.f. 

Oq). It will be convenient to define the likelihood estimating function 
. . . , 0) as 


(12.5.1) 


h (X r • 0) = • • • > 

nB(0o, 0) 


where B(0 q, 0) = +\/b\0q,0). Under the assumptions of 12.3.1, we 
know Vnhnix ^,.. ., a:,,; 0o) has as its limiting distribution as w 00 , 
the distribution A^(0, 1) if the true value of 0 is Oq, If we adopt the stronger 
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conditions of 12.3.3, then since the probability exceeds 1 — e that for all 
n>n, 

VnKipci, Oo) = 6*)[V'«(0o - ^)] 

for points ..., in as defined in (12.3.5), where hn(xi ,.. ., a?^; 6) 
= (9/96) ..., 6) and 6 and 6* are as defined in (12.3.7), the two 

sequences of random variables 

(12.5.2) Vnhj,xi .Oq) « = 1, 2,... 

and 


(12.5.3) h'„(x^ .0*)[V«(0o - fi)l « = 1. 2,... 

converge together in distribution to the normal distribution N(0, 1). 
Furthermore, if (dldd)B\0^, 0) exists and is bounded over Qq, and if 
0 = Oq, then 

(12.5.4) A;(a:i, 0*) and A;(a;,. x„; S) 

converge in probability to B(do, 6 q) as « -► oo. Therefore, we may write 

lim P[-2, < /i;(xi,..., 5)[Vfi(0o - ^)] < +K 1 Oo] = y 

n-*oo 

where 0(2^) — <I)(—2^) = y, from which it is seen that 


(12.5.5) 


lim pFiJ-;=- — - j- < 00 < ■ 

n-oo L ^nh’„(xi,...,x„-,e) 


yjnh'j^x-^ .a;„;0) 



We shall say that the interval having endpoints & ± —~—^ 

is an asymptotic 100y% confidence interval for 0o for large n; its length is 

22y _ 

V nh'Jxi, (5) 


(b) Interval Estimates from General Estimating Functions 

We shall show that the ratio of the squared length of this interval to that 
of a similar interval obtained by using an arbitrary estimating function 
^n(*i» • • •. **>; ®)» satisfying certain conditions, converges in probability to 
a number which cannot exceed 1. 

Let (arj,..., xj be a sample from F(x; 0) which is regular in its first two 

n 

0-derivatives, and let F„ denote JJ FX**; 0), and the value of at 

0 = 0 «. 
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Let 


(12.5.6) 6) 

be a random variable such that: 


(12.5.7) (i) r IVngjx^, • • •, 0)]^= 0, 1; y = 1, 2 

jRn 

for d in Qq; 

(ii) Vngn(xi ,..., 0o) has JV(0, 1) as its limiting distribution as 

« 00 if is the true value of 0; 

(iii) grX^i ,..., 0) has a continuous 0-derivative gni^i, 0) 

for 0 in Qo for all points in the sample space except possibly 

for a set of zero probability; 

(iv) if 00 is the true value of 0, gni^i ,..., 0) converges in proba¬ 

bility to jB*(0o, 0) uniformly with respect to 0 in £io» where 

®) is bounded on Hq and B*(dQ, 6 q)^0; 


(V) if 


and 

Alid, 0') =J 

'Rn 

, x„; 0') dF„ 

then 

BtiO, O’) =J 

\ J'n(a’l. ••• 

Rn 

, ®') dF„, 


' ± 
30 ' 


A:(e, 0 ') = b :( 0 , 0 '), 


for (0, 0') in Qq x 


(12.5.8) 


i4jf(0, 0) = 0, for 0 in 
lim 0) = B*(0, 0), for 0 in £2o- 


Any function gni^i ,. • •, 6) having the properties listed above will 

be called a regular estimating function for 0o. It is to be noted that the 
function defined by (12.5.1) is a regular estimating 

function for 0o. 

It can be verified, by following a line of argument similar to that by 
which 12.3.2 was established, that if, for n > some n^, the equation 
^n(^i> • • • j 0 has a unique root, then the equations 

(12.5.9) g„(xi .0) = 0, « = Mo, «o + 1. • • • 

have a sequence of roots 

(12.5.10) S(Xi, ...,X„), K = Wo, Ho + 1,... 
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which converges in probability to Bq. Furthermore the two sequences of 
random variables 

(12.5.11) Vngjx^, flo). n = «o, «o + 1. • • • 

(12.5.12) 5*)[V n(do - S)l n - «o, Wo+i,... 

where satisfies |flo — ^*1 < l®o ~ converge together in distribution 
to the normal distribution Ar(0,1). But it follows from 4.3.8 that 

(12.5.13) g'Jx^, §*) and g'^ix^,. ..,x„;8) 

converge in probability to B*(dQ, dg) as n -»■ oo. Therefore, as in (12.5.5), 
we have the following statement; 


(12.5.14) 


lim P 

n-* 00 


8--^ 


ylng'„(Xi, ...,x„-,8) 


j < 00 < 0 + -7=-; 




: 01 J ' 


Thus, 8 ± —p-^^ are end points of an asymptotic lOOy % 

, x„; 6) 2A 

confidence interval for 6^ for large n, whose length is —= - - -r-. 

V ng'^ix^, ...,x„; 6) 


(c) Asymptotically Shortest Confidence Intervals 
The ratio r„ of the squared length of the confidence interval in (12.5.5) 
to that in (12.5.14) is 


(12.5.15) 
where 

(12.5.16) 
and 

(12.5.17) 


^ • • • . a^n; 0o) 

h'J^Xi, ...,x„;8) 

d = gn(!gl. ...,X„-,8) 

. a:„; 0o) 


Now Cn and are random variables. But it follows from 4.3.7 that 
the numerator and denominator of each converge in probability to 
®o) and hence converges in probability to 1. Similarly, and 
hence converges in probability to 


(12.5.18) 


B*\8^ 0 o) 

P*(0o. 8o) 


which we wish to show cannot exceed 1. 
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Since (12.5.7) holds for all n, and (12.5.8) can be used, we can differen¬ 
tiate (12.5.7) for j = 1 with respect to 0 under the integral sign, and obtain 
at 0 = 00 



Now the squares of these integrals are equal. But the square of the first 
integral is B*^(0o, 0o). Applying Schwarz’ inequality to the second integral, 
and using the fact that the integral in (12.5.7) fory = 2 is unity, we obtain 


(12.5.20) 


Bl\%, 0 «) 



dF 


nO* 


But the right-hand side reduces to B\0q, 6q) which does not depend on n. 
Hence 


BTKK 0o) 


and taking limits as n -> oo, we have 


( 12 . 5 . 21 ) 


0o) j 

B"(0o.0o) 


Thus, the ratio of squared lengths of confidence intervals converges in 
probability to a number <; 1, and we shall say that the confidence intervals 
defined in (12.5.5) arc asymptotically shortest lOOy confidence intervals 
for for large n. 

We may summarize as follows: 

12.5.1 Suppose . ,x„) is a random sample from the c.d.f. F(x\ Oq), 

where F{x\ 0) is regular with respect to its second 0-derivative 
for 0 in Then if ^ ^nl regular estimating 

function for 6^, asymptotic 100y% confidence limits for O^for large 
n are provided by (12.5.14). Furthermore, there exists no regular 
estimating function for Oq which provides an asymptotically shorter 
lOOy % confidence interval for 0^ than that produced by the likelihood 
estimating function /7„(^i ,, ., ,x^\0) defined by (12.5.1). Thus 
the asymptotically shortest confidence interval is given by (12.5.5). 

The equivalence of the problem of asymptotically shortest confidence 
intervals and that of asymptotic efficiency of estimation can be seen by 
noting that the asymptotic efficiency of the estimator 0 in (12.5.10) as 
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computed by formula (12.3.23) is merely the ratio on the left of (1*2.5.21). 
That is, the square root of the asymptotic efficiency of the estimator B for 
Oq as defined in (12.5.10) is equal to the limit (in probability) of the ratio 
of length of the asymptotic lOOy % confidence interval provided by the 
likelihood estimating function A„(a:i ,... 6) to length of the corre¬ 

sponding confidence interval yielded by ... ,x„; d). 

The problem of asymptotically shortest confidence intervals for the case 

n 

where gn(^i,..., 6) is of the form J g(x^; 6) was considered by Wilks 

(1938ft), and for the more general case by Wald (1942). 


12.6 MULTIDIMENSIONAL POINT ESTIMATION 

(a) Introductory Remarks 

The results obtained in Sections 12.2 through 12.5 pertain to a sample 
from a c.d.f. F{x; 0), in which 6 is one-dimensional. As pointed out at the 
beginning of Section 12.1, the results of those sections remain valid with 
minor changes in notation if the sample is from a A:-dimensional c.d.f. 
F(xi, ..., 0 ?;^.; 6) in which 6 is one-dimensional. 

Now, if 6 is r-dimensional with components ..., 0,., the basic 
results we have obtained have r-dimensional versions which may not be 
immediately evident to the reader. We shall state some r-dimensional 
results without giving details of proof in all cases. The reader who is 
sufficiently familiar with the results already given for the case of a one¬ 
dimensional parameter should have no particular difficulty in furnishing 
these details. 

Throughout this section, we shall consider simple random sampling from 
a c.d.f. F{x; 0) where 0 is r-dimensional and for convenience x is taken 
to be one-dimensional. The results can be extended with only minor 
modifications of notation to the case where x is A:-dimensional. The 
components of 0 will be (0i,. .., 0,.). The true value (0io,..., Oro) ® 
will be denoted by Bq. Corresponding to the one-dimensional interval 
£1 q we will have an r-dimensional (open) rectangle containing 0^. 

An estimator (0i ,... ,6^) for Oq will be denoted by 0. An unbiased 
estimator 0 for Bq is one for which 0, is an unbiased estimator for 0^, 
/? = 1,..., r, A set of sufficient statistics for estimating 0 has been defined 
in Section 12.2(e). A consistent estimator 0 for Bq is one for which Bj, 
converges in probability to 0^^ as « —► oo. 

(b) Efficiency of a Multidimensional Estimator 

We shall now show how to extend the notion of efficiency to the case of 
multidimensional estimators. In this extension, we shall adopt the notation 
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of Section 12. \b. Furthermore, let 


( 12 . 6 . 1 ) 

• • • > *n> ~ (*1 .*n> — 2 


^PQtX^li • • • > *n» •••>*>!> ®)- 

dO, 


If S is an unbiased estimator of some d in and if Cp,/> = 1,..., r, 
are arbitrary constants we may write 


(12.6.2) I ii8^-6^)c;dF„ = 0 

'>Rti J>=1 

r 

which, of course, also states that 2 is an unbiased estimator for 

f p=i 

2 Cp0j,. We shall assume that the components 0^ are linearly independent. 

p^i 

If F(x; 0) is regular with respect to all of its first 0-derivatives, we can 
differentiate (12.6.2) with respect to 0^, obtaining 

(12.6.3) f i (0, - 0,)c; • S,,(a;i,..., a:,; 0) dF, = < 

^RnV^l 

for ^ = 1,..., r. Multiplying both sides of (12.6.3) by summing both 
sides with respect to q, squaring both sides of the resulting equation, and 
then applying Schwarz’ inequality to the squared integral on the left, we 
obtain for 0 in 


(12.6.4) (ic,c;J 

< {£ - 8pKjdF„^ • {£ [i S,„(* 1 ,.. ., 0)cJ*dF„). 

But the quantity in the first { } on the right is ^2 dp8p 10^ and that in the 

second { } is found to be ^p(^y ®)^j> I ®1- Therefore, at 0 = 0^, 

(12.6.4) reduces to ^ ‘ -* 


(12.6.5) 




> l.u.b. 

(Ci, . . . ,Cr) 


(1^4 

ncr® 2 Sj,{x; 6)Cp | 0o 


which, incidentally, furnishes a lower bound for the variance of the 

r ^ r 

unbiased estimator 2 c'JSj, for 2 c'pQp^. 




378 


MATHEMATICAL STATISTICS 


As will be seen at the place in (12.6.4) where the Schwarz inequality was 
applied, the equality sign holds if and only if a linear dependence of form 

(12.6.6) m 2 ( S , - 0,)c; S 2 0)C, 

V P 

exists for all points in the a;-space except possibly for a set of probability 

zero, where K(d) does not depend on , ar J and where neither the 

set Ciy... ,Cj. nor the set c|,..., c' vanishes identically. If an unbiased 
estimator for Oq exists so that (12.6.6) is satisfied for 6 in 
it an efficient estimator and denote it by iJ. 

Let the covariance matrix of V nSj^,p = 1,..., r, at 0 = 0 q be denoted 
by WBpf^W ; the covariance matrix of SjV « at 0 = Oq by as defined 
in Section 12.1(0). The covariance of Vndj, and SjVn is where 
ss I if p =: q and 0 if p 2 ls will be seen from (12.6.3). We may 
then write (12.6.4) as 

(12.6.7) (2b;<»> cx) (2b^c,c,) > (2<5,,v;T. 

\p,q / \p,tf / 'p,(Z ' 


with the equality holding only when (12.6.6) holds; assuming, of course, 
that neither the set q,..., c,. nor the set cj,..., c' vanishes identically. 

Now the inequality in (12.6.7) holds only when the covariance matrix 
of the random variables V«0i,..., S'l/V«,..., SjVn is positive 

definite, which implies that 

(12.6.8) • \BJ > \dj = 1 

and the equality in (12.6.7) holds only when the covariance matrix of the 
same random variables is made positive semidefinite by the linear depen¬ 
dence expressed by (12.6.6), which implies that 

(12.6.9) • IB^I = 1. 


In other words, the inequality (equality) of (12.6.7) holds only when the 
inequality (equality) of 


( 12 . 6 . 10 ) • 15^1 > 1 

holds. Since the components are linearly independent, we 

know by 3.5.2 that ||B;J"'|1 is positive definite and hence > 0. 

For similar reasons \Bj^\ > 0. 

These facts suggest that, as a generalization of (12.2.15), we may define 
the following ratio as the efficiency of the r-dimensional unbiased estimator 
«ofe„: 


( 12 . 6 . 11 ) 


efr(5|9o)* 




Thus eff (91 Oq) a: 1 for an efficient estimator 8. 
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We may summarize the preceding results as fpllows: 

12.6.1 Let (a?!, . ..yxj be a sample from the c.d.f. F{x\ 0), where B is 

r-dimensional, and let F(x; 0) be regular with respect to its second 
B^derivatives. If B is an unbiased estimator for 0q with components 
not linearly dependent whose covariance matrix exists and 

is positive definite, then, for any two sets of constants c^,... ,c^ 
and c[,..., c', not all constants in either set being zero, inequalities 
(12.6.5) and (12.6.10) hold. Furthermore, in both cases equality 
holds if and only if B is an efficient estimator, that is, if and only 
//(12.6.6) holds. 

12.7 MULTIDIMENSIONAL POINT ESTIMATION FROM 
LARGE SAMPLES 

(a) Asymptotic Distribution of the Score 

The score for the r-dimensional vector B is an r-dimensional vector 
whose components are ..., 0), /? = 1,..., r, as defined by 

(12.6.1) . These components have an asymptotic r-dimensional normal 
distribution for large n under conditions analogous to those stated in 
12.3.1. More precisely, 

12.7.1 Suppose (xi,... ,Zn) is a sample from the c.d.f F{x\ 0q), where B^ 

is r-dimensional. Let F{x\ 0) be regular with respect to its first 
B-derivatives in Then if ll^pg (0, 0)||, p, ^=l,...,r, is 

positive definite for B in (*S'p„(a;i, . . . , 0,,), /? = 1,. . ., r) 

is asymptotically distributed for large n according to the r-dimen¬ 
sional distribution N{{0], WnBjJi) where Bj^ = BjJ,Bq, B^. 

The proof of this statement is similar to that of the one-dimensional 
case given by 12.3.1 and is left as an exercise for the reader. 

(b) Convergence of the Maximum Likelihood Estimator 

As in the one-dimensional case, if there exists no unbiased estimator B 
for 00 whose components satisfy (12.6.6) for each n, it can be shown under 
certain conditions that there is a sequence of solutions of the equations 

(12.7.1) S'pCari,..., a;„; 0) = 0, ' p—\ .r 

which converges to 0o with probability 1. 

More precisely, we have the following r-dimensional version of 12,3.2: 

12.7.2 Suppose (a?!, ...,«„) is a sample from the c.d.f F(x; Bq), where 
Bq = (010, • • •, 0ro) is r-dimensional, and where F{x; 0) is regular 
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with respect to its first 6-derivatives in Q^o* SJix\ 6), p ss 1, 
... 9 r^ be a continuous function of 0 in values of x 

in Ru except possibly for a set of zero probability. Then, there 
exists a sequence of solutions of (12.7.1) which converges almost 
certainly to ( 0 ioj • • • > ®ro)* V solution is a unique vector 
01, ...,6r)forn > some /i 0 , the sequence of vectors converges 
almost certainly to ( 6 io» • • • > ®ro) asn-^ co, 

Tl^e proof of 12.7.2 is a fairly straightforward extension of that given 
for 12.3.2 and is omitted. 

(c) Asymptotic Distribution of the Maximum Likelihood Estimator 
The r-dimensional version of 12.3.3 can be stated as follows: 

12.7.3 If («i ,... yxj is a sample from the c.d. f F{x\ 6^, where Qq is 
r-dimensional and F{x\ 6) is regular with respect to its first and 
second 6-derivatives for 6 in and if the maximum likelihood 
estimator {6^,... ,6^ satisfying (12.7.1) is unique for n "^some 

n 

nQ, and measurable with respect to JJ F(a:^; fl), then it is asymptotic- 

e-i 

ally distributed for large n, according to the r-dimensional normal 
distribution IlnB^ll"^) where = ||5^(0o, 0o)ll- 

The line of argument for 12.7.3 follows closely that of 12.3.3 and is 
omitted. 

(d) Asymptotic Efficiency of the Maximum Likelihood Estimator 

In r-dimensional estimation, d is a consistent estimator for dg if each 
component of 0 is a consistent estimator of the corresponding component 

Offlo- 

If is a consistent estimator for dg whose components 0^, are 

such that 

.v^(5,-0o)) 

has a limiting normal r-dimensional distribution N{{0}; as 

R 00 , we define the asymptotic efficiency of 9 as 

(12.7.2) leff(5l0g)-j^,. 

PiHrl 

lim aVr 2 e0,) - 2 

n-*oo . 


(12.7.3) 
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It will be seen from 12.7.3 that 


(a/ flio),...»— ®fo)) 


has N{{0}; as its limiting distribution as n 


(12.7.4) 

Therefore 




leffCa 1 0o) = = 1- 

I ”3X1 1 


00 . Hence 


12.7.4 Under the assumptions of 12.7.3, the maximum likelihood estimator 
6 has asymptotic efficiency 1 for estimating 6 q. 

As in the case of efficient estimation of one-dimensional parameters, 
the earliest studies of efficient estimation of multidimensional parameters 
were made by Fisher (1922). A more recent study of the subject based on 
modern mathematical methods has been made by Barankin and Gurland 
(1951). 


12.8 MULTIDIMENSIONAL CONFIDENCE REGIONS 

In interval estimation of a one-dimensional parameter 6q, as discussed in 
Section 12.4, the essential idea is that for a fixed confidence coefficient y 
and for a sample (a^i,..., from a c.d.f. F(x; 6) there exist random 
variables 0 < 6 such that 

P(O<O<d\0) = y. 

A method by which 0, 0 can be constructed under certain conditions is 
provided by 12.4.1. As briefly pointed out in Section 12.4(6), if the require¬ 
ment of monotonicity of ... 6) in 0 over Q. is removed in 

12.4.1 then there exists a random set Ei{xi ,..., a; J in Ri, which may be 
written briefly as Ei, consisting of all real numbers y in £i for which 

( 12 . 8 . 1 ) < gr^ix^, 

and not depending on Oq such that 

(12.8.2) P(g^ < ..., 0o) < g 2 I 0o) = ^ e \ 6^) = y. 

In other words, the probability is y that E^ contains 0q. That Ei is a Borel 
set follows from the fact that^(a:i,..., 0o) is a random variable for all 

values of 0 in Q. That the probability of Ei containing 0 does not depend 
on 0 follows from the fact that the c.d.f. of g^i^i ,..., a;„; 0) does not 
depend on 0. It should be noted particularly that Ei is a random set in 
Q having probability y of containing Oq, assuming, of course, that the 
sample ..., a:„) has been drawn from F(x; 6 q). 
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Therefore (12.8.2) provides an extension of the idea of interval estimation 
to set estimation. This extension indicates that we can generalize con¬ 
fidence interval estimation to the case of an /--dimensional parameter as 
follows: 


12.8.1 Suppose (xi,... , x J is a sample from the c.d. f F{x\ Oq) where Oq 
is r-dimensionai Let i ^ random variable so 

defined at every point 0 in the parameter space and at every point 

in sample space except possibly for a set of probability zero, 
that when the c.d.f is F(x; 0), the c d.f of g does not depend on 6. 
Let Efx^, . . . , XJ = say, be the set ofpoints y = {y^, . . . , y^) 
in £1^ for which < g,,ix^, ••.,oc^;y)<g 2 ^here g^ and g^ are 
chosen so that 

(12.8.3) P(gi < gjx^,... <g.^\e) = y 

where y does not depend on 0. Then Ej. is an r-dimensional random 
set such that for the true parameter point 0^, we have 

(12.8.4) P(Ooe£,|0o) = r. 

The proof of this statement is straightforward and is omitted. We shall 
call the set a lOOy % confidence region for Oq. An example of a con¬ 
fidence region for estimating a A:-dimensional parameter was given in 
Section 10.4. 

More generally, if gji^iy ..., ; 0) is a vector function with components 

• • • > ®)» P' ^ 1,..., r', whose c.d.f. does not depend on 0 

when the population c.d.f. is F{x, 6), then for any set E* in R^^ for which 

(12.8.5) P((g,g,.JeE?.) = r. 

the set Ef consisting of all points y = (y^.^r) in £ir for which 

(12.8.6) (gi„(a;i, y),y)) e E*. 

is a lOOy % confidence region for 6q, that is, 

(12.8.7) P(dQeEr)^y. 

Ordinarily, the most appealing, interesting and useful functions 
• • •»®) those which provide convex, or at least simply 
connected, confidence regions (random sets) E^., since such sets are more 
likely to be easily described and used. For r = r' = 1, will have such 
properties if gn(^i»..., ®) Is continuous and monotonic in 6 as stated 

in 12.4.1. For more general values of r and /•' the situation is more com¬ 
plicated. 



Sec. 12.8 


383 


PARAMETRIC STATISnCAL ESI1MA110N 

But, in addition to the property of convexity or of being simply con¬ 
nected our intuition requires £, to be “smallest” in some sense. This, 
however, is a complicated problem for small n. For large n, however, there 
is a satisfactory asymptotic solution under certain conditions which we 
shall consider presently. 


Example. To illustrate the preceding ideas, suppose (x^,..., x„) is a sample 
from ffj). Then we know from 8.4.1 that the function 

\ “ 1)** + - /*o)* 


has the chi-square distribution C{n). Therefore for a given y, we can find %% 
so that 

P(g < Xy) = Y- 

Denote by the region in the yit/ 2 -half-plane with 2/2 > 0 which lies between 
the two branches of the hyperbola, having the equation 

^ _ (^1 ~ ^ (n - \)s^ 

n Xy «Zy 

We see that if (/Mq, <^ 0 ) 's the true parameter point, then 

^ 0 ) e ^ 2 ) = y- 


Now let us consider the two-dimensional vector function 


where 


•••,»«: /'o. "0). /»' = 1.2 

n(® - i“o)* « - 1 


= 




<^1 


It follows from 8.4.2 that gi and g 2 are independently distributed according to 
chi-square distributions C(l) and C{n — 1) respectively. If we choose x\ and xl 
so that 

P(gi < xlgi > A:i) = y 

and let E$ be the region in the 2 /i 2 / 2 “plane for which 

fKVl - *)* ^ „2 ^(» - 0* ^ vS 

where yj > 0, we have 

p((mo> " 0 ) 6 PS) = y- 


Thus, E 2 and E$ are examples of confidence regions determined by vector 
estimating functions having one and two components respectively. From an 
intuitive point of view, E$ is more satisfactory than jE^ since E$ is bounded and 
convex, whereas Eg has neither of these properties. If one chooses as 
g{xi, ,,. /aq, <Tq) the p.d.f. of («, 5*), one can obtain a random region Ef* 
which is more satisfactory than E$ on the criterion of being ‘‘smaller,*’ but £{* is 
more complicated than Ej and will not be described in detail. In case of large it, 
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there is an optimum method of determining an asymptotically smallest region 
for estimating (/4 q, <7q). The general problem of asymptotically smallest confidence 
regions is considered in the next section. 


12.9 ASYMPTOTICALLY SMALLEST CONFIDENCE REGIONS 
FROM LARGE SAMPLES 


We shall now consider the r-dimensional extension of 12.5.1. 

It follows from 12.7.1 and 9.3.2 that if the true value of d is Oq, the 
sequence of random variables 

(12.9.1) .. . , 0o) = « 2 n = 1,2,... 


where = (l/«)‘S'p„(a:i,..., 6) as defined in (12.6.1) converges 

in distribution to the chi-square distribution C(r). Under the stronger 
conditions of 12.7.3 we can apply the mean value theorem of differential 
calculus to Un and write 

£/* s Un if the value of 6 belongs to 

and 

C/* = 0 if the value of 6 does not belong to 


where 


(12.9.2) Ulix^ . x„;do) = J, B<^>[Vn(0^ - ^„)][V'J(0^ - S,)] 

P,Q 


for all points in the sample space R„ except possibly for a set of probability 
zero, where 


(12.9.3) 



and |0po - 6*\ < {0.^ - Sj, p=l,...,r. 


It can be verified from the conditions of 12.7.3 and from 4.3.7 and 

r 

4.3.8 that converges in probability to 2 as 

71 00 and hence that and C/J, 77 = 1, 2,..., are sequences of random 

variables converging together in distribution to the chi-square distribution 
C(r). Now let us choose Xy such that 

(12.9.4) lim P(t/„ < x^) = lim PiUl < x^) = y 

n-*oo n->oo 


and let £,(®i . *n) and E*(xi, ..., xj be the sets of points y = 

(ifi .^r) in Clf snch that 

(12.9.5) ’ U„(x„...,x„-,y)<xl 

and 


(12.9.6) 




y) < xt 
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respectively. Therefore, if is the true value of B, we have 
(12.9.7) lim P(0o e £,) = lim P{B^ e E*) = y. 


that is, Ef and E* are asymptotically equivalent confidence regions for 
estimating 0^. But E* is an ellipsoid in centered at ©o whose volume is 


(12.9.8) 


K{r)Xy 


approximately, for large n, where K{r) is a constant depending only on r. 

Now let • • •»^ vector random variable with linearly 
independent components • • • > ®)» P 1,...» r, whose first 

0 -derivatives are 


(12.9.9) , x^; 0), p,q = I,... ,r. 

Denote g^„(xi, ...,x„;0) and g^„(xi, ...,x„;0) by g„(0) and g^(d). 

We assume that gJ.B),p = 1,..., r, and gp,(6), p.q = 1. r, satisfy 

conditions (iii) through (v) listed under (12.5.7) for 
gn(^i,, , . 0) respectively. Let the corresponding A*, B* functions 

be denoted by A*J6o, 0 ), /? = 1 ,..., r, and 5*^«(0(), 0 ), /?,9 = 1 ,..., r, 
where (12.5.8) holds with the obvious replacements. Let B*^(0, 0) = 
lim 0), and B*^ = B*^(dQ, 0o). For (12.5.7) (i) we would have for 

n-^ 00 

y = i 

(12.9.10) f ^ng,(d)dF„ = 0 
and for J = 2, we would have 

(12.9.11) f [y/TigXemy/ng,(e)’] dF^ = c,,,(0). 

^Rn 

Corresponding to (12.5.7) (ii), we assume that if 0 = 0o in the population 
distribution, then (V«^p( 0 o),/? = 1 ,. .., r) is asymptotically distributed 
according to the distribution A^({0}, ||Cp^||), for large n, where 

(12.9.12) C^,=limQ,„(0o), 

n-*co 

and where llCj,^|| is positive definite. 

A vector function gi^i,..,, 6) satisfying the conditions in the 

preceding paragraph will be called a regular estimating vector function for 
00. The reader will note that the likelihood estimating vector function 
hfB) = (lln)Sj,n(^i, ..., 0) is a regular estimating vector function. 
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For such an estimating function it can be shown that if there exists a 
sequence of unique vector solutions , Sf), for n > some itp, of the 

equations 

(12.9.13) 

• • • > ®n> “ 1» • • • » r, M = /Iq, /Iq + 1, . . . 

and if 6^ is the true value of d, then the sequence of random variables 

IVn(5i - 9io),..., V«(5, - UJl, n = no. «o + 1. • • • the r-dimen- 
sional distribution N({0}, ||D’‘*||) in the limit as n -*• oo where 

(12.9.14) 2 

It follows from 9.3.2 that if 6o is the true value of fl, then ..., a:„; 0©) 

defined by 

(12.9.15) « i C^gJiW%) 

P.«*l 

has as its limiting distribution as /i oo the chi-square distribution C(r). 

Furthermore, by an argument similar to that by which it was established 
that if 00 is the true value of 0, the limiting distribution of each of the 
random variables and t/* as n -+ oo is a chi-square distribution C(r), 
it can be shown that and the associated random variable 
F*(xi, 0o) defined by 

(12.9.16) v: = i Di?[V«(^p - 0,o)][V"(^« - ®«o)] 
where 

(12.9.17) - i 

and where \B^ — B*\ < \6^ — 5,1, /> = 1,..., r 

converge together in distribution to the chi-square distribution C(r) as 
n-*- 00. 

The sets EX ^,..., x J and E^Xxi,.. .,zj defined as the sets of points 
y (Vi,..., y,) in £2rt for which 

(12.9.18) f',(*i,...,»»;y)<25 
and 

(12.9.19) K*(*1.»n;y)<;^ 

respectively, where is chosen as in (12.9.4), are asymptotically equivalent 
sets in Q,o> is, the limits of F(£,') and PiE*') as » -»• oo are equal. 
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(12.9.20) 


Vn|D<»'| 


Now the ritio r„ of the volume of E* given by (12.9.8) to that of £*' given 
by (12.9.20) converges (in probability) as n —»• oo to the number r given by 


(12.9.21) 


r = 


VI pj 

sI\BJ 


But substituting the expression for Dp, given by (12.9.14), we find 


(12.9.22) 


r 


l<J 


By differentiating (12.9.10) with respect to 0^, we obtain the following 
result 

(12.9.23) f g,,(0) dF^ -h f dF, = 0 

^Rn ^Rn 

where S^(0) stands for .6). 

From (12.9.23) it is clear that at 0 = Oq, 


(12.9.24) 



—cov 



yjn /‘ 


But the left-hand side of (12.9.24) is ^o)- Therefore, the right-hand 

side is also ^o)’ which we shall abbreviate to Therefore, the 

sequence of 2r-dimensional random variables 



),..., yJngX%) 


“ 7 = • • 

V'l 



1 , 2 ,... 


converges in probability to a 2r-dimensional random variable having the 
covariance matrix 


(12.9.25) 


^PQ 

< 

Bl 

^PQ 



and since (12.9.25) is a covariance matrix, we have 


(12.9.26) \CJ • \BJ > 

Therefore the limit (in probability) of the ratio of volume of confidence 
regions given by (12.9.22) cannot exceed 1. 

Since and £' are asymptotically equivalent to £* and E*' respectively, 
this means that there exists no regular vector estimating function 
g(x ^^. .., 0) for which the lOOy % confidence region for 6q defined by 
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..., in (12.9.18) is asymptotically smaller than the 100y% 
confidence region for 6q similarly obtained by replacing gj,{d) by 
We may summarize the preceding discussion and conclusions in the 
following r-dimensional extension of 12.5.1. 

12.9.1 Suppose (a?!,..., a; J is a sample from the cJ ,/. F(x; Oq), where Bq 
is r-dimensional, and F{x ; 6) is regular with respect to all of its second 
6-derivatives for 0 in Q^. Then ifgpni^i^ • • •, = U • • •, ^ 

is any regular estimating vector function for Oq, an asymptotic 
1007 % confidence region for Oq is provided by the set defined by 
the points y = (yj, •. * ^Vr) for which (12.9.18) holds. Furthermore, 
there exists no regular estimating vector function which provides an 
asymptotically smaller IOO 7 % confidence region for 6 q than that 
obtainedby using thelikelihoodestimatingvectorfunctionhj,n(x^, ..., 
x^; 0),p = 1,.. ., r, defined in (12.9.1). Thus the asymptotically 
smallest lOOy % confidence region consists of the points 
y = (yj,. .., y,.) for which (12.9.5) is satisfied. 


The problem of asymptotically smallest confidence regions for the case 
of multidimensional parameters has been discussed by Bartlett (1953), 
Beale (1960), and by Wilks and Daly (1939). The reader interested in 
further details should consult these papers. 

Example. As an example of the problem of obtaining the asymptotically 
smallest IOO 7 % confidence region consider a sample from a multinomial 
population, having the A-parameter p.f. 

dFis-, 6) = 

where the random variable (j^, ..., can take on one and only one of the 
k + i values ( 1 , 0 ,..., 0 ),. .., ( 0 ,..., 0 , 1 ) and where > 0 ,..., 0^+1 > 0 
with 01 + • • • 0*^1 = 1. We may, for convenience, take 0i,..., 0^ as the k 
parameters to be estimated by a confidence region. Suppose (jh, . . •, I = 
1,..., n) is a sample from this distribution. For /ij,(0) we have 

1 A 3log</F(ff;e) 1 Y (fpi - ±tli 
nii'i de„ 0*+i 

n 

where 0^^! = 1 — 0i — • • • — 0^. Setting 2 write 


= if “ 




nO 


where 


* /I — a:. 


Furthermore 


1 , ..., A: 



/aiogrfF(5;e)\'l Jls, 



L\ 00. ) 

\ 00 , /J Llo. 

o*+J \o. 

0*+i/J 


= ^+_L 

p>q = 1.- 

..,k 
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where Sj^ is the Kronecker <5 which has the value 1 if and 0 if p The 

element in the inverse matrix is readily seen to be 

Therefore, for the quadratic form Un in (12.9.1) we find, after some algebraic 

reduction. 


Jt *+1 (x 

U„=n2 B^h^ie)h,i6) =2 — 


p,q^l 




- nS^)^ 

nSj, 


which is the classical function originally proposed by K. Pearson (1900). Thus, 
for large «, the lOOy % asymptotically smallest confidence region for estimating 
the A:-dimensional parameter d = (0i ,... ,8^^) is provided by the set Ej^ of real 
and positive vectors x = (t/i, ... ,yjc) for which 


P=1 nyj, 


<X^y 


where yjc^i = 1 — 2/i ” * * * 2 /a: and x^ is the lOOy % point of the chi-square 
distribution C(^). 


PROBLEMS 


12.1 


Show that the two distributions having p.d.f.’s 


/i(*:e) = J 

= 0 , 

/ 2 (*; 0 ') = ^ 
= 0 


0 < X < 0 

otherwise 
0 < a? < 0' 
otherwise 


are not absolutely continuous with respect to each other unless 0' 0. 

12.2 Suppose ^ is a random variable having a distribution with c.d.f. 


F(x ; 0) 



a: < 0 

X > 0 


where 0 is any number in (0, -foo). Show that <^(S(x;d)) =0 and find 
a%S(x; 0)). Determine H(0, 0') and show that for a fixed 0 it has a maximum 
with respect to 0' at 0' = 0. 

12.3 A discrete random variable x has the p.f. 

p(x;e)=(\ -0)0^1 a^ = l,2 ,... 

where 0 is any number on (0,1). Show that ^iS(x; 0)) =* 0 and find a\S{x; 0)). 

12.4 If (o?^,..., a?„) is a sample from any distribution having finite A:th 

1 " 

moment ^(x^X show that “ 2 ^ consistent estimator for /(«*). 

'*^-1 

12.5 If X is the mean of a sample from the binomial distribution Bi{l,p) 
show that X is sufficient for estimating p. Also show that x is an efficient 
estimator for p. 
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12.6 Suppose (xi,,,Xn)ia a sample from the p.d.f. 

X >0 

where 6 is a positive parameter. Show that the sample mean x is sufficient for 
estimating 6 and that (n — 1)1 nx is the only unbiased estimator for 6 depending 
on X, 

12.7 {Continuation) Show that x is the maximum likelihood estimator for Ijd 
and that it is efficient. 

12.8 If X and are the mean and variance of a sample of size /i > 1 from a 
normal distribution N{tJL, o\ show that the two-dimensional random variable 
{x^ ^) is sufficient for estimating the vector of parameters (/m, a^). 

12.9 If Si and Sg are two efficient estimators for a parameter 6 show that the 
correlation coefficient between and ^2 is unity. 

12.10 If S is an efficient estimator for S, and 6 is any other unbiased estimator 
for 6, and if and k > 1, arc the variances of 6 and 6, show that the 
correlation coefficient between 6 and 6 equals l/V^. 

12.11 (Continuation) If S is an efficient estimator for 0 , whereas di and 02 are 
inefficient but unbiased estimators with variances both equal to where 
A > 1 , show that the correlation coefficient between 6i and S^ is at least (2 — k)lk, 

12.12 Suppose (a?i,..., a?^) is a A:-dimensional random variable having p.d.f. 

f(xi, ..., 0), for 0 E Let g(xi, ..., a?^) be an unbiased estimator for v(0). 

If/is regular in its first 0-derivativc in and if v(0) has a derivative with respect 
to 0 in Qq, show that 



and that the equality holds if and only if 9 log//00 s C(^ — V'W), with 
probability 1, where C does not depend on (arj,..., Xj^). 

12.13 If (a^d),..., a;^^)) are the order statistics of a sample from the 

rectangular distribution /{(0/2, 0), show that a;^^) is sufficient for estimating 0 and 
that(/f — \)x^n)ln is an efficient estimator for 0. Show that (a:^^), — y) 

is a lOOy % confidence interval for 0. 

12.14 If (x^i ^,..., a;(„)) are the order statistics of a sample from the 
rectangular distribution /?((0i -f 6^ — 0i), 02 > 0i > 0, show that the two- 
dimensional random variable (a?^), a?(„)) provides sufficient estimators for (0i, 02 ). 
Show that 

”^(1) _ ^(n) 

If - 1 II - 1 

and 

^(n) ^(1) 

If — 1 If — 1 

are minimum variance estimators of 0^ and 02, respectively Also show that 

and ^ 

are minimum variance estimators of the midpoint (0^ + S^I2 and range 02 — 0i. 
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12.15 Show that the variance of every unbiased estimator for cr* in 
of size n from a^) is at least 

12.16 If (x ^,..., ic J is a sample from the normal distribution N(/i, a*) where 
// is known, show that 

" 1=1 

is an unbiased estimator for a with asymptotic efficiency 1 /(tt — 2 ) for large n. 

12.17 If (a?i,..., ajJ is a sample from the Cauchy distribution having p.d.f. 

1 

7r[l + (a; - fif] 


show that the asymptotic efficiency of the sample median for estimating in 
large samples is S/n^. 

12.18 If (a?!,..., a; J is a sample from the distribution having p.d.f. 


r(^ -h i) ’ 


a; > 0 


where A > 0 and A: is a known constant, show that the maximum likelihood 
estimator A for A is (A + l)lx. Show that this estimator is biased but consistent 
and that its asymptotic distribution for large n is ^(A, X^l[n(k + 1 )]). 

12.19 (Continuation) Show that V(A + 1 ) log A is a function of A whose 
asymptotic distribution for large n is normal with variance 1 /n. 

12.20 If (a?i,..., x^ is a sample from the binomial distribution Bi{\^p\ 
show that the maximum likelihood estimator p for p isx and that its asymptotic 
distribution for large samples is N{p^ [p{\ — /?)]/»). Show that sin“^ (2p — 1), 
has an asymptotic normal distribution with variance l//i in large samples. 

12.21 If X is the mean of a sample of size n from the binomial distribution 
^i(\^p)y show that the asymptotically shortest lOOy % confidence interval for 
large n is* given by the two values of p which satisfy 

= ±yy 

where P( —< « < 4- 3 /y) = y,« being a random variable having the distri¬ 
bution 7V(0,1). 


(x — p)y/n 
Vp(} -p) 


12.22 If X is the mean of a sample of size n from the Poisson distribution 
Po(fi) show that asymptotically shortest 100y% confidence interval for /i for 
large n is given by the two values of ju which satisfy 


(£ - 




where is defined in the preceding problem. 



/jSi MATHEMATICAL STATISTICS 

r 

12.23 ..., 1,it) is a sample of size n from the multi¬ 

nomial distribution Af(l; /ii,... ,/ijfc), show that the maximum likelihood 

1 ” 

estimators for (pi,. • .^pk) are («i, where = - 2 

covariance matrix of these estimators is ”«*i 

I “ i^iiPi •" PiPi) 

where * 1 , / «y, and 0, i 9 ^= J. 

12.24 If ..., f — 1,..., it) is a sample of size it from the k- 

dimensional normal distribution |k^^||), show that the maximum likeli¬ 

hood estimator for the vector (//j,..., /i^) is the vector of sample component 
means , ,xj^) and the maximum likelihood estimators of the matrix ||(r^^|| is 



where , / 

in 

Sij = - - - 2 (a^<{ - *<)(% - *i)- 

12.25 Suppose (y^ ..., 2/n) are independent random variables having the 
normal distributions 

^(^ 1^11 + * * * + Pk^jcly ^*)» • • • » ^(Pl^ln + * * * + PkP‘^kni 
respectively, where x ^^,..., Xj^i ,..., x^^, • • • ,Xkn are constants. Let 

n 

/».? = !.* 

n 

/»= 1 ,..., 

Show that the maximum likelihood estimators for (Pi, ..../?&) are 

. 

Np-l / 

where ||a»»|| - ^ - 1, and ||a„|| is assumed to be nonsingular. 

Also show that the maximum likelihood estimator for o* is 


0/«)|ai),«,|/l«iKi|. />o. 9 o-0.1.....*; p,q = \,...,k, 

n 

where Aqo 

12.26 Suppose (a?i, has p.d.f. /n(a?i,..., »n; ^o) and 6 is an unbiased 

estimator for ©o- Show that for any <5 > 0 


where 


o\S\e^)>W(g^\e,)i 



^nl^O *1“ » » » » ^w» ^o) 

^fvfp'l* • • • » * ®o) 
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and hence that 

«*(« 1 0«) > 1 1 «o)]. 

a result due to Chapman and Robbins (1951). 

12.27 Suppose , a;J is a random variable having p.d.f. • • • * 

Xni B) such that (12.2.25) holds, that is, such that a sufficient statistic 0 exists for 
6, and suppose /n(a?i, 0) and v(6; 6) have second partial derivatives 

dB) and d^vl(dx^ dB) for each point in x Q (except possibly for a set of 
probability 0) where Rn is the sample space and Cl is the parameter space (an 
open interval on the real line). Show that /„(a?j,..., 0) must be of form 

exp [KiiB)g(6) + ATgC®) + ^n(^i. • • •»where K^^CB) and ^ 2 ( 0 ) do not depend on 
(xi ,. . ., aj„); hn(xi ,. .., a;^) does not depend on 0 and g{6) depends on 6 and not 
0. [Koopman (1936) and Pitman (1936).] 

12.28 (Continuation) Functions of sufficient estimators as minimum variance 
estimators of their mean values [Rao (1949)]. Let u(g(B) be a function of 6 whose 
mean value is q(B). Let v(g(6)) be another function whose mean value is ^(0). 
Then if w(g(B)) = u(g(B)) - v(g0))y we have 

I w(g) exp 1 X 1 ( 0 )^ + K^(B) + A^] dx^ — • dx^ = 0, for all Be Cl, 
jRn 

Show that if this expression can be repeatedly differentiated with respect to 0 
under the integral and if w(^) can be represented by a Taylor series, then 
^(h'(^))^ == 0 for all Be Cl, which implies that w(g) = 0 with probability 1 for all 
0 6 fi. Hence u(g(S)) differs at most from v(g(6)) over a set of probability zero, 
thus showing that under the assumptions made u(g(6)) is a unique unbiased 
estimator for ^(0) and hence from 12.2.3 that u(g(B)) has minimum variance as an 
estimator of its mean value. 



CHAPTER 13 


Testing Parametric Statistical Hypotheses 


13.1 INTRODUCTORY REMARKS AND DEFINITIONS 

This chapter is devoted primarily to the basic principles of the theory of 
testing parametric statistical hypotheses originally set forth by Neyman 
and Pearson (1928, 1933), and the relations between this theory and maxi¬ 
mum likelihood estimation theory in large samples. The ideas of statistical 
hypothesis-testing have been considerably extended and generalized in 
the last few years; however, we shall not attempt to cover these extensions 
in detail here. The reader interested in them will find an excellent treatment 
of the subject by Lehmann (1959). 

A few of the basic concepts and results for finite samples are introduced 
in this section; later sections deal with asymptotic theory for large 
samples. The asymptotic theory for large samples is closely related to 
parametric estimation theory for large samples as has been presented in 
Chapter 12. 

Suppose (a?!,..., a?„) is an /i-dimensional random variable having c.d.f. 

• • • j ®) where 6 is an r-dimensional parameter and its space is £2 
which, for the present, will be an open region in the Euclidean space jR^. 
The most important case of .. . ,zj arises if (x ^,..., ar„) is a random 

n 

sample from a c.d.f. F{x ; 9), in which case ..., ®) = IT ®)- 

Most of this chapter is devoted to the case of random sampling. 

Let (o be a suteet the true value of 6 in the population, 

we sfiail set up th e statistic al hypo thesis SF that &n e ^" against the alter^ 
imtive that OgeCl ^ co, or more briefl y, Q). We say that jTls true 
ifdo e and e £2 — The assumption that ©o ^ is someSines 

felerred to as the null hypothesis. Unless ambiguity arises, we shall drop 
the 0 and write % as 0. If cu contains one point 9 the hypothesis 3^ is 
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called a simple h ypothesis; o therw ise a composite hypothesis. We decide 
to accept or rejectl^on the basis of .T., xjrMore precisely, let 
be a set in the sample space which does not depend on 6 such thatTf : 

..., Xn) e we reject otherwise we accept . Two^ypes of 
errors are recognized here: a Type I error is committed if ^ is rejected 
when it is true, and a Type II error is committed if is accepted when it 
is false. The selection of IV^ and the occurrence of (xj ^,..., a;„) in 
as the criterion for rejecting is called a statistical test, or more briefly, 
a test of is called the critical set or critical region of the test. 

There will be no ambiguity if, for the sake of brevity, we refer to as 
the test of The quantity* 

(13.1.1) P(I 6) = f dF„- [ 4('X,,... ,Xv..) dx 

IWn V.'-v. ' 

is called the power of the test and is a function of 6 whose values lie on 

the interval [ 0 , 1 ]. P{Wn\&) ^ — P{W^ \ d) as a function of 6 is called 

the operating characteris tic function of the test W^. The probabilities of 
committing Type I and Type II errors are respectively, 

(13.1.2) P{W^ I 0). for 0 e w 
and 

(13.1.3) 1 - P{W^ I 0 ), for 0 gQ ~ CO. 


These two probabilities are frequently called the risks of Type I an^Type II 
errors, respectively ^ 

To be satisfactory, a test must, of course, satisfy certain criteria. A test 
Wn would be ideal if the probabilities for Type I and Type II errors, 
as given in (13.1.2) and (13.1.3), were both zero. This requirement, 
however, is obviously too much to hope for in general; we must be 
satisfied with less. A desirable p roperty of a test is th at of being 
unbiased, that is, of satisfying the condilTons 

y(13.1.4) P(IF„|0)<a, if0Gco 

and 


v>el3.1.5) 


P{W, 


CO 


lf",,[ 0 )>a, if 0 GQ- 

\/ V ^ 

for a given level o f signfflcance a, 0 < a< 1. This means that the Type I 

erroF is controlled by requiriflg its "jprobabifity not to exceed a and at the 
same time we are assured that the probability of rejecting Jf’ if it is not 
true (that is, if 0 g — co) exceeds a. If test IV„ is such that P(lVn | 0') = « 


* Some authors prefer to introduce the characteristic function of the set H'n 
where« 1 at all points in and 0 otherwise, in which case P(Wn j 0) =* ^(spw^ 1 
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for some 0 6 a>, is said to be a test of size a for 0 and is denoted 
by If there is no ambiguity we shall henceforth use fV^ to denote a 
test for a given a and sample size n. A test not unbiased is said to be biased. 

If the test is such that equality in (13.1.4) holds for a ll 0 6 to, then 
W', is said to simitar to the 

\ criiical r^on. ^ 



lim P(W. I 0) = 1, for 0 eQ - o), 

n-»oo 

then is said to be a consistent tes t of size a for the hypothesis .^(co; fi), 
a term introduced by Wald and Wolfowitz (1940). 

If and W* are two tests for Q) for a given a, such that 


(13.1.6) P(W: I fl) < P{W^ 10), it Be CO 

and 


(13.1.7) P{W:\ 6) > P{W^ 10), if 0 6 Q - tt), 

then the test W* is uniformly more powerful than the test W^\ W* 
would be preferred to 
and (13.1 .7), the ^st is 

Example. To illustrate some of the preceding ideas with an important 
example, let , a:J be a sample from a population having the normal 

distribution N(ja, cr*). Consider the composite hypothesis jr(a>; Q) where Cl is 
the set of points (m, <’*) for which cr* > 0 (that is, a Euclidean half-plane) and co 
is the subset of Cl for which /a ^ /jiq (that is, the points of the line ^ in Cl). 

We know from 8.4 J that if Jf is true — ^)/j has the Student distribution 

Sin — 1). Let us take as the set of points in Rn for which 

[ (« - 

I ^ 

where 

r+<« 

1 - /l•-l(0 a 



in satislies (13.1.6) 


a unifo rmly most poworjul test for Jtp , 


and ffi^iit) is the p.d.f. of the Student distribution S(n — 1) given by (7.8,4). 
Then we have 


PiW^liM, o^)€co)^Pi\t\>0^a, 


and hence is a test of size a for ^(a>; O) which has the proj^rty of being 
aimilar to the sample space. Furthermore, for any point Ou^, a}) in O — co, we 
have 


I (Ml, <^!)) 




> ^. 10^. 4 



t + 
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where 3 = (/<i — ~ and t and u are random variables having the 

following p.e. over the half-plane u>0: 

«n-3) 

- - -- c-ja + tV(n-l))u^, 



Denoting this p.e. by /(/, u) dt du and noting that for all values of u in (0, oo) 


/ 


it follows that 




/(/,«) dt < 


/•+<« 

J-U 


u) dt 




P( I / + ^ I > ^(1^1 > O = «» ^^0 


which, of course, means that is unbiased for testing 

Unfortunately, unbiased and uniformly most powerful tests exist only 
in various special situations. In the case of a simple hypothesis, there 
is a fairly general solution for the case where — cu contains one 
point, and a solution under certain conditions if Q — co is a set of 
more than one point. The case of a simple hypothesis will be considered 
in Section 13.2. In case of composite hypotheses, solutions have been 
found only in special situations, particularly where is similar to the 
sample space. But uniformly most powerful unbiased tests exist in an 
asymptotic sense for some composite hypotheses for large samples in case 
of population distributions for which maximum likelihood estimators 
exist and are asymptotically normally distributed. We shall deal with 
this problem in Sections 13.3 and 13.4. 

Historical remark. In developing the theory of acceptance sampling for 
accepting or rejecting lots of mass-produced articles. Dodge and Romig of the 
Bell Telephone Laboratories introduced the concepts of producer's risk and 
consumer's risk about 1925. Their first published results appeared four years 
later [See Dodge and Romig (1929, 1959)]. These concepts were, in fact, the 
forerunners respectively, of risks of Type I and Type II errors given by (13.1.2) 
and (13.1.3). More precisely, suppose 6 is the fraction of defective articles in a 
lot of N articles, and t is the fraction of defective articles in a sample of n articles 
from the lot. Then for a given value of 0 in the lot, / is a random variable with a 

1 r 

c.d.f. Fn(t; 0), the sample space of t being 0, r < n in the interval 

[0,1] and the parameter space of 0 being the points 0,1/A^, • • • > ^ the 

interval [0, 1]. Given a value Bq of 0 an unacceptable fraction defective for 
the lot would be one for which B > Bq, in which case co * (0^, 1), whereas an 
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acceptable fraction defective {zone of preference) would be one for which Q < Oq, 
in which case Q — co = (0, 6^), Now we choose and the critical set 1], 

so that we reject the lot and so that P{t £ \ 0 e oj!) c, a. The quantity 

P{teWfj^\b£ (o) is the producer's risk, that is, the probability the inspector will 
reject the lot if it has an acceptable fraction defective 0. The producer's risk is 
controlled so as not to exceed a, being (approximately) a if 0 = 0 q. In practice 
a is usually taken as 0.05 or 0.10. The quantity P{te W^\QeO. — co) = /? is the 
consumer's risk, that is, the probability the inspector of the lot will accept it if it 
has an unacceptable fraction defective 0. In practice, if N is sufficiently large, one 
can choose n large enough to make the consumer's risk arbitrarily small if the 
fraction defective in the lot has any specific value exceeding Oq. Ordinarily, 
the sample size n is chosen so that the consumer's risk P has a given value 
(usually 0.05 or 0.10) for some given value of 0, say 0i, slightly larger than 0^. 
The interval (0o, 1) of unacceptable values of 0 is broken into two sets: (0o, 0i) 
a zone of indifference, and (0i, 1) a zone of rejection. Note that we can calculate 
P{t E I 0) for a sample of size n from a lot with any given fraction defective 0, 
and it is the probability of accepting the lot if the fraction of defectives is 0; it is 
a function of 0 which may be denoted by L(0). The graph of L(0) is called the 
operating characteristic curve of the sampling plan specified by the sample size n 
and critical fraction defective It satisfies the conditions L(0o) = 1 — a and 

L(0i) = p. 

A second important concept introduced by Dodge and Romig (1929) was that 
of average outgoing quality limit, which is defined as the maximum (over all 
values of 0 ) of the mean value of the fraction of defectives in a lot assuming that 
rejected lots are screened of defective items and then accepted. This assumes, of 
course, nondestructive testing, that is, that a defective item can be determined 
without destroying the item upon testing it. 

The basic concepts of Type I and Type II errors have long played a funda¬ 
mental role in the administration of criminal law, the counterparts of risks of 
Type I and Type II errors being respectively, the risks of convicting an innocent 
person and of acquitting a guilty one. 

13.2 TEST OF A SIMPLE HYPOTHESIS 
(a) Case of Two Alternatives 

In case is a simple hypothesis which we may conveniently describe 
for the moment by taking one point Oq in cd and one point 0 ^ in D — (o, 
and if {x^, ... ,x^ has p.d.f. fjix\, yX^;0) for Oq and 0^, there exists, 
under conditions to be stated below, an unbiased and a most powerful 
test for J^iO^; Oq U Oj). This problem was originally investigated by 
Neyman and Pearson (1933). The following result, due to them, provides 
a most powerful unbiased test for 

13.2.1 Suppose {x^,,,, ,x^ is a random variable with p.d.f. 
/n(^i> • • •»parameter space Q, consists of two 
distinct points which, for convenience, we label Oq and 0i. For 
a given c^ > 0, let be the set in the sample space for which 
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(13.2.1) /»(*!» • • • » ®n» ^ • • • » ®o) 

am/ ivAere 

If W* is any other set in R„ such that 
P(W: I (?o) = a. 

then 

(13.2.2) PiW, \ fli) > P{Wt I 0i). 

//tal is, unbiased and is a more powerful test for testing 
^(Oq', Oq \j 6j) than any other test W* of size a. 

To establish 13.2.1, we first show that is a more powerful test 
fiian Wf. We note that for a given c, > 0, is the event in the space 
of (* 1 , ...,*„) for which (13.2.1) holds, the amount of probability on fT, 
for 6 = 01 being a. That is, is a test of size a for % 'J ®i)* 

Now suppose W* is any other test for of size a, that is 

(13.2.3) P(,Wt 1 0o) = a. 

We then have 

(13.2.4) P(»; - (w. n W.*) 10„) = P(w: - (w. n w.*) 10„). 

Now for any point in the sample space R„ contained in W^, we have 
/n(*i. •••.*«; 6i) > ..., x„; 0o), and hence 

(13.2.5) PiW. - (W. n Wt) I 0i) > c,P(W, - iW, O W:) | 0o). 

For any point not contained in fV^, we have 
/«(*!. ®i) < cj'„(xi ,..., x„; 0^ and therefore 

(13.2.6) c,P(W: - {W, n w:) I 0,) > P{w: - iW, O wn 1 0i). 
Making use of (13.2.4) in (13.2.5) and (13.2.6), we obtain 

(13.2.7) P{W, - (W, n IP.*) I 0i) > P{Wt - (W, n w;) I 0i). 

Now IP. — (IP. n IP*) and IP. O IP* are disjoint sets, and so are 
- (W'a »?) and IP. n IP*. Hence we may add P(1P. n IP* | 0i) 
to both sides of (13.2.7) and obtain (13.2.2), that is, 

WJ0i) > W*|0i) 

which is equivalent to the statement that IP. is a more powerful test than 

W*fotJir. 

We must now show that IP. is unbiased. 
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Now is obviously unbiased if > 1 in (13.2.1). If c* < 1, there 
exists a subset of for which 

(13.2.8) . x„;0j) >/„(*!. do), if (x^, ...,x„)e W;,. 

Otherwise it follows that throughout 

(13.2.9) 0 < • • • J ®l) < /n(^l» • • • > ®o) 

which contradicts the fact that 

/n(a?i» •••,^nlOo)dx^---dx^ = \ fjx^, • • •, 6^ dx^--- dx^ = 1. 

jRn JRn 

It is evident that if is the entire sample space 

(13.2.10) P(w;, I fli) - PIU'' I ©o) = P(Rn - W' I do) - PiR,, - w, I eo 

> P(w. - w: 10o) - P(w, - w: 1 e»i). 

that is, 

(13.2.11) P(W, I 6,) > P{W. I 0o) = a 
and hence W^, is unbiased. 

A theorem similar to 13.2.1 can be stated in the case where (x ^^..., a; J 
is a discrete /i-dimensional random variable with p.f. Pnix ^,..., ; 6 ) where 
the distributions having p.f.’s Pn{xi, • • • and pJix^, • • • > ®i) 

are the alternative distributions involved in the statistical hypothesis 

Finally, we remark that where (a?!,..., a; J is a random sample from a 
distribution having p.d.f. f(x; 6) we would merely replace fnixi, ..., a:„; 6) 

n 

in 13.2.1 by Y[f{x^\d), A similar remark holds if (a^i,..., a;J is a 

5=1 

sample from a discrete distribution having p.f. p{x\ 0). 

(b) Case of Similar Critical Regions 

For certain special p.d.f.’s /„(xi ,..., aj^; 0) the family of sets defined by 
(13.2.1) for all positive values of and for the alternative d = 6^ is the 
same family of sets for all values of di in Q such that 6^ ^ 6q in (13.2.2). 
In this case, is the uniformly most powerful unbiased test of size a 
for the hypothesis Q). 

In particular, if there exists a sufficient estimator 6 for 6 having p.d.f. 
v(8; 6) such that the family of sets in R^ defined by v0; 6^ > c^v(B; 6q) 
for all positive values of is the same family as that obtained by any other 
value of Sj 6q in fi, then for the given a one could find a uniformly most 
powerful unbiased test from this family for Jf’( 6 oJ which would 
clearly be determined by the sufficient estimator S, But even this approach 
holds little hope except for the case of a one-dimensional parameter 0 . 
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It is therefore sufficient to give an example to illustrate the existence of 
such a test. 

Example. Suppose ..., is a sample from N(ja, 1). Let the parameter 
space of iu be the real line Ri, Now consider the hypothesis 

Applying (13.2.1), we find after some reduction that is the set in the sample 
space Rn for which 

where > 0. Thus, is determined by the sufficient estimator x for /i. When 
we recall that if (a?!,..., a;J is a sample from 1 ), then x has the distribution 
l//f), it is evident that: 

(i) If > /Iq, the set in R^ for which x > where is defined by the 
equation 

1 - - /Xq)) = a 

<C>(a;) being the c.d.f. of ^^( 0 , 1 ). 

(ii) If fx^ < jxq, is the set for which » < where 

^(^n(jXQ - z;)) = a. 

In case (i), the test with critical set defined by x > z^ is uniformly most 
powerful for testing the hypothesis •^(jxq; fx > fx^, that is, that // = fx^ against 
any alternative /x ^ fx^ where jx^ > fx^. Since 

1 - 0 ( V n{z^ - iMi) > 1 - H^niz^ - ix^\ if > fx^ 

it is evident that this test is unbiased if O consists of all values of /^ > /x^. 
Similar remarks hold for case (ii). 

(c) Walds’ Reduction of a Composite Hypothesis with one point in 12 — co 
to a Simple Hypothesis 

In a composite hypothesis (o U 6^), that is, having one point 0^ 

in 12 — CO, Wald (1939) has approached the problem of constructing a 
test for essentially by replacing the family of p.d.f’s ..., ®) 

for 6 G CO by a p.d.f. ...,«„) obtained by averaging/„(a;i, 0 ) 

with respect to 6 by means of an a priori c.d.f. Q{d) defined over co. More 
specifically, let 

(13.2.12) fo(x„ .. ., = f Ux ,,. .., a:,; 0) c/G(0) 

Jco 

which is a p.d.f, and consider the problem of testing the hypothesis Jff ’q 
that the population p.d.f. is/o(a:i,.... a: J against the alternative that it is 

/»(*!> • • • > ®n> 

Assuming that /o(ari,..., ar J has all the properties ascribed to 
• • • »»fi; ®) in 13.2.1, then 13.2.1 can be applied to determine a 
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most powerful test of size a for testing JUfg. Wald’s (1939) theorem 
states that: 


13.2.2 If there exists a c.d. f. G(d) for 6 e to such that the most powerful 
test Woa of size a for Jfo “ olso ofsize a for testing M’im ; to U 6i), 
then 


(13.2.13) 


f dGiB) = 


where odq is the subset of to for which 


(13.2.14) 



0) dx^ -- dx„ - a, 


and ]Vo„ is the most powerful test for Jf’(to; to U 0i). 

To prove 13.2.2, note that if, for given a, Wq^. is a test for Jf’((w; to U Oj), 
then we must have 

(13.2.15) r /„ (xj,..., a:„; 0) tfoi... t/a;„ < a, 0eto 
and hence 

(13.2.16) f f /„(*!,...,*„;0)tia:i --dx„tiG(0)< a ft/G(0). 

J(a J Wq(i •'w 

If we interchange the order of integration, it is evident that (13.2.13) must 
hold in order for 



that is, in order for IFo, to be of size ot. 

Now if Wq^ is the most powerful test for J^g, we must have 

(13.2.17) 

/«(»!.*«; 6i) < , /«(*!. 0i) • dx„ 

Jwoa 

where Wg^ is any other test of size a for But (13.2.17) is precisely 
the condition for Wg^ to be the most powerful test for ^(co; (o u 0^). 

It should be noted that if Wg^^ were a similar set, that is, if the equality 
holds in (13.2.15) for 6ea>, then (13.2.13) would be a trivially true 
statement. But, of course, in this instance the similarity property essentially 
solves the problem of determining a critical region. 


13.3 THE LIKELIHOOD RATIO TEST 

Tests for composite hypotheses having optimal properties for finite 
samffies have been obtained for various special problems by an important 
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principle due to Neyman and Pearson (1928, 1933) called the likelihood 
ratio principle, which is a natural extension of the test defined by (13.2.2) 
to a composite hypothesis. 

In large samples these likelihood ratio tests have optimal asymptotic 
properties under the same conditions for which asymptotically normally 
distributed maximum likelihood estimates exist. 

On the other hand, more general approaches have been made to the 
problem of testing composite hypotheses for finite samples by Lehmann 
(1950, 1959), Lehmann and Scheffe (1950), Lehmann and Stein (1948), 
and others. 

The only tests of composite hypotheses which we shall consider in detail 
are likelihood ratio tests, particularly their asymptotic properties for large 
values of w. Likelihood ratio tests for large values of n are just as funda¬ 
mental in the theory of statistics as the large-sample theory of parametric 
statistical estimation considered in Chapter 12. 

Asymptotic distributions of likelihood ratio tests of composite hypo¬ 
theses in large samples when the hypothesis tested is true were first 
discussed by Wilks (1938^). Wald (1941a, 1941Z>) later made a detailed 
study of these tests including asymptotic properties^ of bias, power, and 
other aspects. Many of the results of the next few sections are slightly 
less general versions of those contained in Wald’s fundamental papers. 
The methods used here, however, are perhaps simpler than those used 
by Wald. 

(a) Definition of a Likelihood Ratio Test 

Suppose (a?!,..., a;J is a sample from the c.d.f. F{x\ 0) where 6 is 
r-dimensional. For convenience we take x as one-dimensional although 
our results extend to the case where x is fc-dimensional with minor 
notational changes. The likelihood element is 

(13.3.1) dF,(x,, ..., a:,; 0) = n dFix^; 0). 

Now consider the hypothesis 

(13.3.2) jr(a>;Q) 
and let 

(13.3.3) (df„), = suprfF„(*i. x„-,e) 

dea 

(13.3.4) (dF„)„ = sup dF„(x„ ...,x„;d). 

eeci 

In most ordinary applications, (dF„)„ and (dF„)n are simply maxima of 
</F„ for 6 eo) and d eCi obtained by the usual differentiation procedure. 
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The likelihood ratio for testing Q), or more briefly, 3^, is 
defined by 

(13.3.5) Xje = . 

(dF„)a 

The values of lie on the interval [0,1]. The critical set in for 
testing Jif" is the set for which being a constant, where 

(13.3.6) I dF^<oL, 0 6 CO. 

The test defined by < c is essentially an extension of that defined by 
(13.2.2) to the case where co and Q — co have more than one point each. 

In addition to this fact, A^^ has intuitive appeal as a test. For if we are 
comparing the “plausibility” of one value of d against another, given that 
we have a sample (a?!,..., a; we would intuitively be inclined to choose 
that value of 6 which gives the likelihood element the larger value. Thus, 
if we cannot obtain an appreciably larger value of the likelihood element 
by searching for a value of 6 through the entire parameter space than 
we can by searching through the set co, our intuition will assess the evidence 
as strongly favoring the proposition that the “most plausible” value of 6 
belongs to co, that is, that Jff" is true. We shall see that, in the case of large 
samples, this intuitive appeal can be rigorously supported under fairly 
general conditions. 

Example. Suppose (^i,..., oo^) is a sample from N(fi, a^), and that we wish 
to determine the likelihood ratio test for the hypothesis ^ defined as follows: 

0 : the half-plane for which —oo<ju< + oo, a®>0 
0 ) : the subset of Q (half-line) for which n = /Mq* 

We have 

[- 2 ^ 1 /"^ - • 

Maximizing c/F„ with respect to (^, a*) in H, and again with respect to (//, a^) in 
a>, and taking the ratio of the two maxima in accordance with (13.3.5), we obtain 
as the likelihood ratio for Jf, 



where 5h is the minimum of the sum of squares 2 (®e “ /^)* with respect to /i, 
that is, 

- i^(»« - «)• 


whereas 
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Note that can be written as follows 


where 



t = Vr^S - ft^ls 


X and being the sample mean and variance defined in (8.2.1) and (8.2.6). The 
random variable t is, of course, the “Student” ratio defined in (8.4.16). Thus, the 
critical region of size a of the likelihood ratio test is the portion of the sample 
space of for which 

< ^a) = «» for = flQ 


where 


and is chosen so that 




a 


/i_i(0 being the p.d.f. of the “Student” distribution S{n — 1) defined by (7.8.4). 

The likelihood ratio Ajf> for testing the hypothesis is therefore equivalent 
to the “Student” /-test. 


(b) The Likelihood Ratio Test in Normal Regression Theory 

In small samples one of the most important applications of the likelihood 
ratio test is in the testing of various hypotheses concerning the parameters 
of a normal distribution. The preceding example is one important 
illustration. In many of these applications the exact sampling distribution 
of the likelihood ratio can be found for finite samples of size n. Most of 
these tests are quite straightforward and will not be further considered 
except in the problems at the end of this chapter. 

It is perhaps worthwhile, however, to consider the likelihood ratio test 
for testing the hypothesis that some of the regression parameters in normal 
regression theory have zero values. Referring to Section 10.3(c), suppose 
f = 1 ,..., 72 are independent random variables having normal 
distributions + • • • + a*), f = 1 ,..., n, where (a;^^, i =* 

1,..., A:) are linearly independent fixed vectors (that is, the matrix ||a, J 
in (10.3.8) is nonsingular). Let be the hypothesis for which Q and 
(o are defined as follows: 


Q: the half (k + l)-dimensional Euclidean space 
(13.3.7) for which — oo < < + oo, / = 1 ,..., fc, or® > 0. 

co: the subset of Q for which = • • • = = 0, fc' < fc. 

The hypothesis thus defined is sometimes called the general linear 
hypothesis of normal regression theory. Our sample consists of what may 
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be regarded as the conditional random variables (y; | , x^, 

{ « 1.r), and the likelihood of the parameters Pv • • • > Pk9 ^ is 

given by 



Determining (dFJn and (dFJ„ by the usual differentiation procedure, 
we find 


(13.3.9) 



where is the minimum of the sum of squares 


(13.3.10) 2 (y^ - - 

with respect to , /9*, and is the minimum of 


(13.3.11) i (yj - - 

1=1 


with respect to Pi,, p^. Referring to Section 10.3(6) it will be seen 
that S'q is identical with Si as given by (10.3.27), that is, 

(13.3.12) Sn = 


where the i,j = \,... ,k, and are defined in (10.3.8), (10.3.11), 
and (10.3.28). la^xl is, of course, the determinant of the matrix 
whereas is defined by (10.3.28). 

Similarly, we have 


(13.3.13) 


S„ 




\a,r\ 


i^cre |a<jxjl and are similar to and |ay| with i',j' = I,... ,k'. 
It follows by an argument based on Cochran’s theorem 8.4.4 that, if 
is true, Sq/o* and (S„ — S'o)/a* are independently distributed according 
to chi-square distributions C(r — k) and C(k — k'), respectively. We 
shall not present the details since the argument is similar to that 
used in establishing the fact that Si}(fi and are independently 
distributed according to chi-square distributions C(r — k) and C(k), 
where Si and are given by (10.3.30). 
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Thus the likelihood ratio in (13.3.9) may be written as 


(13.3.14) 
where 

(13.3.15) 


\ (n-k) / 

^ Sq) 

(fc - fe')So 


has the Snedecor distribution S(k r- k\n -- k) whose p.e. is defined by 
(7.8.9). Since there is a one-to-one correspondence between and 
it therefore follows that the likelihood ratio test for is equivalent to 
the ^ test. The critical region Xjg> < in the sample space of where 
Aa is the 100a % point of the distribution, corresponds to the critical 
region ^ > [{n — k)l{k — A:')](A“^/” — 1) in the sample space of 
The ^ test just described is the Model I analysis of variance test in its 
most general form. Daly (1940) has shown that this test is unbiased. 

We therefore have the following basic result concerning the general 
linear hypothesis in normal regression theory: 

13.3.1. Suppose I == are independent random variables having 

the normal distributions + • • * + /?A{. ^ = 1.«. 

where the matrix /,y= 1, defined by (10.3.8) is 

nonsingular. The likelihood ratio X^ for testing the hypothesis 
3^ specified by (13.3.7) Is given by (13.3.14) where ^ is given by 
(13.3.15). Furthermore, if 3^ is true [(n — fc)/(A: — k'^liX^^^ — 1) 
has the Snedecor distribution S(k -- k\n — k). 

The general linear hypothesis is more comprehensive than may appear 
at first glance. More generally, we may wish to test the hypothesis 3(f^' 
that pjc'+i = • • • 5 = Pko rather than Pjc'+i = • • • = = 0 as 

now specified by co in (13.3.7). In this case, we may introduce the random 

variable y'^ = (y^ - /3fc'+i,o%+u-determining (rfFJcu- 

We would thus have independent random variables yj, f = 1,..., /z, with 
normal distributions + • • • + Ih® hypothesis Jf" 

would be specified by (13.3.7) with no change in D but with Pjg^^^ = 
• • • > Pk^ Pko The likelihood ratio A^^. is identical in 

structure to Xj^ with y^ replaced by in the matrices and 

The distribution theory of Aj^/ when Jf’' is true is exactly the same at the 
distribution theory of A^ when is true. 

Again, we may wish to test the hypothesis 3if'' that there exist certain 
linearly independent linear constraints among the p^,,., ,p^ of form 

k 

2 ^iuPi *= yuo* 


u = k' -h 1,..., fc 
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where the Ci„ and are specified numbers. In this case we may apply 
the transformation of regression coefiicients defined by 

= yi,..., fit- = yic-, 

2 = y„, « = fc '+ 1. k 

to the regression function + ... and obtain a regression func¬ 
tion of form + • • • + where , 2*1 are linear functions of 

• • •» ^ with coefficients depending on the The 

hypothesis Jif"" is thus reduced to an hypothesis of type discussed above, 
with yi,..., y* playing the role of , Pj, and , 2 *^ playing the 

role of a?!!, ..., f = 1,..., n. The sampling theory of when 
is true is exactly the same as that of when is true. 

13.4 ASYMPTOTIC DISTRIBUTION OF LIKELIHOOD 
RATIO IN LARGE SAMPLES 

We now turn to large-sample problems. First we shall consider the case 
of a simple hypothesis 

(13.4.1) 

which will be referred to as in this section, where 6 is a one-dimensional 
parameter, the parameter space Q being an interval on the real axis Ri 
containing some (open) interval Qq containing 6q, In large-sample theory 
the only part of Q which plays an essential role in the asymptotic sampling 
theory of the likelihood ratio test is Qq. We shall assume that for a sample 
of size n, there exists a maximum likelihood estimator (5„ for 0o> such that 
the sequence of estimators = 1,2 ,... converges in probability to Oq. 
The corresponding sequence of likelihood ratios, therefore, is given by 

n dF(x^ ; So) 

(13.4.2) Xjr = ^ -, n = 1,2,.... 

TldF(xf;6„) 

5 = 1 

We shall show that under conditions to be stated below 
-21ogAj,., n=l,2,... 

converges in distribution to the chi-square distribution C(l) when is 
true. 

We shall assume that F(x ; 6) is regular with respect to its first and second 
9-derivatives for 9 in Qq Section 12.1). Under these conditions 

m. 0) “ f" log dF(x; 9) dF(x; 9,) 

J-00 


(13.4.3) 
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can be differentiated twice under the integral sign with respect to B, thus 
yielding 

So) = 0 

(13.4.4) 0o) = -m. %), 

where BHB^, B^ is defined in (12.1.8). 

Therefore H{Bq, B) has a relative maximum H{B^, B^ at 6 = Bq. The 
quantity H{Bf^ 6) is of basic importance in statistical information theory as 
developed by Kullback (1959). In general, if Fi(z), F^{x) and G{x) are three 
c.d.f.’s, absolutely continuous with respect to each other, Kullback’s 
mean information or information integral for discriminating Ffx) against 

Ffx) per observation from G(x) is defined by 7(1; 2) = 

Hence 

77(00, 0o) - m. fli) = log fjf"' dFix; 6^) 

J-oo dF(x;0i) 

is, in Kullback’s sense, the information integral for discriminating F(x; 6 q) 
against F(x; O^) per observation from F(x; 6 q). 

Remark. A particular case of the function H(0, 0) first arose in statistical 
mechanics. Suppose a; is a point whose components represent the coordinates 
and momenta of a given gas molecule. The space of x is called the phase space 
of the system. If F(x\ t) represents the c.d.f. of x in the aggregate of molecules 
in this system at time /, then the integral H(t, t) which would be obtained by 
using F{x\t) in (13.4.3). //(/,/) is the //-function originally introduced by 
Boltzmann (1910) to approximate — logP (except for an additive constant) 
where 



the probability that for n molecules in the system, there will be /i, molecules in 
cell £■„ / = 1, 2,..., N, at time /, where Ez,... ,En are disjoint “cells” 

in the phase space (having equal probabilities ~ which constitute the entire 

phase space. //(/, t) essentially measures the deviation of the system from a 
“most probable” or “equilibrium” state. 

Functions of type H(6q, d) also play an important role in communication 
theory as developed by Shannon (1948). 

Recalling from Section 12.2(c) that B%6q, 6q) has been defined by Fisher 
as the amount of information pertaining to 6 q per observation from F{x; 
it will be noted from (13.4.4) that Fisher’s amount of information per 
observation is the second derivative of Kullback’s information integral 
[//(0o, 6q) — //(0o, ®)] at 0 = 00- Stated another way, 6q) essentially 
measures the curvature of //(0o, 0) at its maximum value which occurs at 
O^do. 
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Summarizing, we may say that: 

13.4.1 If F{x\ 6) is regular with respect to its second d-derivative for 0 

in £5o, then 11(0^, 0) has a relative maximum of H{0 q^ 0^) at 0 0 q, 

The second derivative of ®o) ^(®o> ®)] {Kullback's informa¬ 

tion integral) at 0 ^ 0 q, is B\0 q, 0^ [Fisher's amount of information 
pertaining to 0 q per observation from F(x; Ag)]. 

In a sample . ,x^ from the c.d.f. F{x; 0q\ the quantity 

(13.4.5) -2^ogdF(x.-,6) 

is the mean of a sample of size n from a distribution having mean H(0q, 0q), 
and hence by 9.1.1 converges in probability to H(0q, 0q), which exists if 
F(x ; 0) is regular in its first fl-derivative in Qg* Also, under these conditions, 
^(*1. ... , x^), n = 1,2,... converges in probability to 6g as n -> oo, as we 
have seen in 12.3.2. Furthermore, it follows from 4.3.8 that 

(13.4.6) - ilogF(a!j;fi) 

ni-1 

converges in probability to H(0Qy 0 q) as /i oo. 

Therefore 

13.4.2 If (xi,,x^) is a sample from F(x; 0g), where F(x; 0) is regular 
in its first 0-derivative in Qg, then both 

- 2 log dF(x^\ 6o) and - 2 ^(a:^; 6) 
n f-i n f=i 

converge in probability to H{0 q, 0g) as /i oo. 

If 6 satisfies the stronger condition of being asymptotically normally 
distributed, then we have the following result concerning the asymptotic 
distribution of —2 log when Jf is true: 

13.4.3 If 6 is asymptotically normally distributed in accordance with 
12 J.3, then 

(13.4.7) Um Pi-2 log < / | Oo) = du 

n-*co -y 2Tr Jo 

that is, if 3^ is true, —2 log Xjf, n =* 1, 2 ,... converges in 
distribution to the chi-square distribution C(l). 



411 


Sec. 13.5 TESTING PARAMETRIC STATISTICAL HYPOTHESES 


To establish 13.4.3, we refer to the assumptions of 123.3. Since 6 
converges almost certainly to as n -► oo, it follows that for an arbitrary 
£ > 0 there exists an n, such that the probability exceeds 1 — e that the 
following equality holds 

(13.4.8) i log dF(z^; d„) = f log dF(xi; 6) 

+ 11 . 

for all n > n„ where 6* is a random variable such that |flo — ®*l < l®o ~ ^1* 
Now (13.4.8) can be rewritten as 


(13.4.9) 


^-1 \dF{xtid)/ n^^iLou Jo=e* 


It should be noted that the left side of (13.4.9) is —2 log The fact that 
the probability exceeds 1 — e that (13.4.9) holds for all n > implies 
that the left- and right-hand sides of (13.4.9) are sequences of random 
variables for /i = 1,2,... which converge together in distribution to the 
chi-square distribution C(l). The expression 

(13.4.10) -i i r^logdF(*^;e)] 

converges in probability to B^Oq, dg). Furthermore, V 0o)(®o 
converges in distribution to N{0, 1). 

Therefore, the sequence of random variables on the right of (13.4.9) 
converges in distribution to the chi-square distribution C(l). Thus, 
we conclude that the expression on the left of (13.4.9), namely, —2 log 
converges in probability to a random variable having the chi-square 
distribution C(l), thereby completing the argument for 13.4.3. 


13.5 CONSISTENCY OF LIKELIHOOD RATIO TEST 
Now suppose is the set in for which 

(13.5.1) -2\ogXje>xl 

where is the hypothesis Qq), and x\ is the number for which 

> X^ = tiiis probability being computed from the chi-square 
distribution C(l). If F{x\ 6) is regular with respect to its second fl- 
derivative in Oo» 1^®*' know from 13.4.3 that 

(13.5.2) lim P{W^ 1 0o) = a- 

n-»oo 
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Now suppose 01 ^ 00 is any point in Oo< consider 

(13.5.3) lim P(W, | 0i). 

n-*oo 

We can write 

(13.5.4) Xjr = Air-v 
where 



If we let 

(13.5.5) -2 log = u„, -2 log v = nv„, 

then fVa is the set in for which 

(13.5.6) u„ + nv„ > xl 

If the true value of 6 is then it follows from 13.4.3 with 6q replaced 
by 01 , that 1, 2,..., is a sequence of random variables converging 

in distribution to the chi-square distribution C(l). We may write 

(13.5.7) i log dFiXfi ei)--i log dF(x^i e„)l. 

Ln n J 

1 ^ 

It follows from 9.1.1 that if 6^ is the true value of 6, - 2 ®i) 

in ” e-i 

and - 2 log rfF(a;.; Oq) converge in probability to Oj) and 0o) 
n ^-1 

respectively. But we know from 13.4.1 that H(0i, 0j) > JFf(0i, Oq)- There¬ 
fore the sequence of random variables t;„, w = 1, 2,... converges in 
probability to the positive number Vq = 2[ff(0i, ©i) — Oq)]- Since 

^ 1,2,..., converges in distribution to the chi-square distribution 
C(l), it is seen that 

(13.5.8) lim P(u„ + nv„> xt) = i 

n-»oo 

which is equivalent to the statement 

(13.5.9) lim P(W, ] 0 = 0^) = 1 

n-*QO 

that is, is a consistent test for Q). 

Therefore, for 6inQ the sequence of power functions 

(13.5.10) P{W^\0), « = 1,2,... 

converges to the function 
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which means that for sufficiently large n, that is —2 log is a 
consistent test for 3^. 

We summarize as follows: 

13.5.1 If 6) is regular with respect to its first and second B-derivatives^ 
the likelihood test defined by (13.5.1) is a consistent test for 

13.6 ASYMPTOTIC POWER OF LIKELIHOOD RATIO TEST 

We have seen from 13.4.3 that for large samples the likelihood ratio 
determines a test for the hypothesis O) for which the probability of 
a Type I error is computed from a chi-stjuare distribution with one degree 
of freedom, and from 13.5.1 that the test is consistent, that is, the power 
function of the test as «-► oo converges to ^(0) as defined in (13.5.11). 
We shall now show that under certain conditions the power function of 
no other consistent test for Q) of size a converges to ^(0) more 

rapidly than that for the test determined by Xjf, 

For this purpose, it is sufficient to return to the regular estimating 
function gJ<Xi, • * • defined in Section 12.5, which, as we have seen, 

has properties sufficient to insure that if is true the two sequences of 
random variables given in (12.5.11) and (12.5.12) converge together in distri¬ 
bution to the normal distribution N(0, 1), as n oo. Hence, the sequence 

(13.6.1) ngl(xi,..., x„; do), «=1,2,... 

converges in distribution to the chi-square distribution C(l). Let W* be 
the test defined by 

(13.6.2) ngl{Xi,...,x„;eo)> xl 

where Xa i® ll'® same number defined in (13.5.1). Therefore, we have 

(13.6.3) lim P(W* | dg) = a. 

n~* 00 

Now it follows from the properties of • • • > ®) a regular 

estimating function for Bq that, if the true value of 0 is 0o, where 0^ 
is also a point in Qq 

(13.6.4) ng^ixi,..., x„; Og), n = l,2,... 
and 

(13.6.5) w* = {g'„(x„ x„; 0*)[V'n(0i - §) + V^(dg - 6^]}^, 

/i = 1,2,... 

converge together in distribution if either does. But we may write 

(13.6.6) w* = M* + nv* 
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where 

(13.6.7) «: = .... 

♦ 1 r 

= - L^n - «nj- 

n 

Since the true value of 0 is Sj, the sequence u*,n = 1,2,..., converges in 
distribution to the chi-square distribution C(l), and v*^n ^ 1,2,..., con¬ 
verges in probability to the constant where 

(13.6.8) vl = 5*2(01, 0i)(0o - 0i)2. 

Now Vq > 0 since 0i ^ 6q and 6j) > 0. Therefore, 

(13.6.9) Urn P(u: + nu:>x!)=l 

n-*ao 

which is equivalent to the statement that 

(13.6.10) lim P(fV: I 0 = 0i) = 1. 

n“*oo 

But (13.6.3) and (13.6.10) together imply that the test provided by the 
inequality (13.6.2) is consistent for testing 
The problem of comparing the power functions of the likelihood ratio 
test 1V^ defined by the inequality (13.5.1) and the arbitrary test fV* 
defined by the inequality (13.6.1) is complicated by the fact that both tests 
are consistent and hence have the same limiting power function ^(0) 
defined by (13.5.11) as oo. To overcome this difficulty, we shall 
introduce and utilize some new notions to be defined below. 

If is a test for ^ based on a sample (x^, ..., a:„) whose limiting 
power function ry(0), as n oo, is defined as follows for 0 in fip 

(13.6.11) ^(0o) = a> 

a < Tjid) <1, 0:^ 00 . 

we shall say that is an asymptotically unbiased test for .^(0o; t2) at 
significance level a. Note that asymptotic unbiasedness of a test is a 
weaker condition on the power of the test than consistency of the test. 

Suppose and are two asymptotically unbiased tests of size a 
for whose asymptotic power functions are and respectively 
for 0 in Qq. 

If 

(13.6.12) nm > %(e) 

withfiiiOf) a® * awe shallsaythat W^Jsasymptoticallymorepowerful* 

* Strictly speaking, we should use the phrase asymptotically at least as powerful 
unless there is at least one value of 0 for which only > holds in (13.6.12). But use of the 
shorter phraseology should cause no ambiguity. 
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than fV 2 cc for testing . If the equality holds in (13.6.12) everywhere in Qq, 
we shall say that and are equivalent asymptotically unbiased tests 
of size a for 

Now let us return to the problem of comparing the power of tests 
and W* defined by (13.5.1) and (13.6.2), respectively. We shall consider 
tests Wca, and defined by the inequalities 


(13.6.13) 


«„ + 

< + cvl > xl 


respectively, where cr> 0 is arbitrary. Since and are both non¬ 
negative, it is clear that c; and c W* for n> c. 

We shall show that for every c> 0, is asymptotically more powerful 
than fV* for testing in which case we shall adopt Wald’s (1941a) 
terminology and say that fV^ is asymptotically more stringent fortesting 
than W*. 

If and Wca, are asymptotically equivalent for every c > 0, we shall 
say that and W* are asymptotically equally stringent for testing 

An important case of asymptotically equally stringent tests may be 
stated as follows: 


13.6.1 Let be the likelihood ratio test defined by the (13.5.1) and W* 
that defined by replacing gj^x^, by , a?n; ®o) 

in (13.6.2), where hj^x ^^..., is defined by (12.5.1). Then 

if the c.d.f. F{x\ 0) is regular with respect to its second O-derivative 
in asymptotically equally stringent for 

testing 

We have shown in Section 13.5 essentially that (w„, i^„) in (13.6.13) is a 
pair of random variables whose distribution converges, as n -► cx>, to a 
degenerate distribution in the (m, n)-plane which is the chi-square distri¬ 
bution C(l) along the half-line t; = w > 0, where 

(13.6.14) t;o = 2[//(0i, 6^) - H(d^, 0^)]. 

Similarly, we have shown that (u*, v*) is a pair of random vaiiables 
whose distribution converges to a degenerate distribution in the («, t;)-plane 
which is the chi-square distribution C(l) along the half-line v = v* and 
w > 0, where 

(13.6.15) 0i)(eo - 
But 

(13.6.16) H(e^, do) = i/(0i, 0i) + H'(0i, 00(00 - 0i) 

+ iff'(0i,0?xeo-0i)* 
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where |6o — Ofl < |0o ” ®il- Since flj) * 0, we can write 

(13.6.17) 2[H(0i. 0o)] = 6 f)( 6 o - 

But 


and since F(x-, 6) is regular with respect to its second 9-derivative in Qg, 
we have the result dXB — 9^, that 

(13.6.18) < 1 

m.9.) 

which, of course, is similar to (12.5.21) at 9 = 9o. Therefore, since /f*(9i, 9) 
is continuous in 9 over it follows that, for 9g and 9^ sufficiently near 
each other, (13.6.18) holds with 9^) replaced by —/f''(9i, 9*), which, 
in turn, implies that 

(13.6.19) 0 < »? < Pg. 

Now consider the power functions of and for a fixed value of c, 

that is, consider 

(13.6.20) P(«„ + cp„>;fS|90 and P(«: + cr! > zl| ^i) 

as functions of 9^. Denote the limits of these functions as n -«■ oo, that is, 
the asymptotic power functions of and W^, by 

(13.6.21) 7 ,.( 9 i) and 


respectively. 

Since and are non-negative, and since the distributions of (u„, p„) 
and («•, V*) converge to degenerate distributions in the (u, »)-plane as 
K -> 00 which are chi-square distributions C(l) along the half-lines 
t) SB Pg, tt > 0 ; and p =» p*, « > 0 , respectively, it follows that 


(13.6.22) 

(0 Ve(Bi) =® »??(®i) = 1 . for values of 9^ such that pJ > — 

c 

2 

(ii) 1 > VeCBi) > Ve(Bi) > «. for values of 9^ such that 0 < Pg < — 

c 


(Hi) »?*(9g) = = a, for 9^ = 9o, that is, for 9i 

such that Pg = 0 . 

But it follows from (13.6.8) that the three sets of values of 9^ indicated 
in (13.6.22(0, (ii), and (Hi)) consists respectively, of (i) values of 9^ in Qg 
but outside some interval containing 9g, (ii) values of 9^ 7 ^ 9g inside 
and (iii) the value 9i » 9g. 
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Therefore, since the asymptotic power functions rjdd) and 
satisfy (13.6.11) for every c> 0 and since the pair satisfies (13.6.12), it 
follows that is asymptotically more powerful than W* for testing , 
for every c > 0, and hence that is asymptotically more stringent than 
W* for testing Jif". 

If v* = Vq, then (13.6.22) holds with (ii) replaced by 

1 > Vc(^i) = »?*(®i) > a 

for values of 6^, such that 0 < i;J < xllc, which means that Wca, and 
are equivalent asymptotically unbiased tests for every c > 0, that is, 
and W* are asymptotically equally stringent for testing e^. But v* = Vq 
if and only if equality holds in (13.6.18) which, in turn, holds if and only if 
the regular estimating function gni^i, 0) used in defining Wl is 

replaced by h^ix^,..,, x^; 0), But we know from 13.6.1 that using 
• • •»(13.6.2) yields a test W* which is asymptotically 
equally as stringent as the test defined by (13.5.1). 

We may summanze our results in the form of the following important 
theorem: 

13.6.2 Suppose ..., arj is a sample from the c.d.f. F{x\ d) where 
F{x\ 6) is regular with respect to its second 6-derivative in Qq. 
Lei iVg^ be the likelihood ratio test of size cl defined in (13.5,1). 
Suppose ••• 6) is any regular estimating function for fl, 

as defined in Section 12.5, and let W* be the test of size cl based on 
gnip^v • • • > ^ defined by (13.6.2). Then is asymptotically 

more stringent than W* for testing the hypothesis 
However^ if we replace gjxx^. ,,, x„; 6) in (13.6.2) by A„(a;i,..., 
x^\ 6) as defined in (12.5.1), the resulting test and test W* are 
asymptotically equally stringent for testing J^{6 q\ Q). 

13.7 THE LIKELIHOOD RATIO TEST OF A 
SIMPLE HYPOTHESIS 

In Sections 13.4, 13.5, and 13.6 we have presented the principal large- 
sample asymptotic properties of a likelihood ratio test for a simple 
hypothesis in which the parameter 6 is one-dimensional—in fact, the 
important part of the parameter space Q in the theory developed in those 
sections is an open interval Qq in Q containing 0q, The basic argument 
by which these asymptotic properties were developed can be extended 
without essentially new difficulties to the case where 0 is an r-dimensional 
parameter (O^,..., 0^)- It will therefore be sufficient to give the r- 
dimensional extensions of the results of Sections 13.4 through 13.6 with a 
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minimum of detail. We shall regard the random variable x having c.d.f. 
F(x, 6) as one-dimensional, although our results will hold for A:-dimensional 
random variables with only minor changes in notation. 

The hypothesis to be considered here is the simple hypothesis 

fir) 

which will be denoted by where 0# is the point (0io> • • •. ®ro) tl, 
is any set in the Euclidean space which contains an r-dimensional 
open interval Q,o which, in turn, contains 6^. In dealing with asymptotic 
theory of likelihood ratio tests for the part of Q, which will be of 
major interest is 

First, it will be convenient to state the extension of 13.4.1 for the case 
of an r-dimensional parameter 6 as: 

13.7.1 If 6 is r-dimensional and if F(x; 6) is regular with respect to 
all of its second d-derivatives in Q,^, then H{6 q, 0) as defined in 
(12.1.17) has a maximum of H{6 q, 6^) at 6 = Og. Furthermore, 
the negative of the matrix of second derivatives of H(6 q, 6) 
at 0 = 00 w ll^jK,(®o. ®o)ll. P> ?. = 1. • • •, ^ as defined in 
(12.1.19). 

The proof of 13.7.1 is a straightforward extension of that of 13.4.1 to 
the case of an r-dimensional parameter and will be omitted. It should 
be noted in the r-dimensional case that /f(0o, 0©) — Hid^, 0^) is Kullback’s 
information integral for discriminating F(x\ 6 q) against F{x; 0i) per 
observation from F(x; 6^, whereas |lBp,(0o, 0o)ll> Fisher’s matrix of infor¬ 
mation pertaining to 0o per observation from F(x\ OJ, is the matrix of 
second derivatives of Kullback’s information integral at 0^ = 0o [See 
Kullback and Leibler (1951)]. 

The r-dimensional version of 13.4.2 requires no change of wording, 
understanding, of course, that 0, and (5 are r-dimensional. The r- 
dimensional version of 13.4.3 may be stated as follows: 

13.7.2 If B is r-dimensional and if F(x; 0) is regular with respect to all of 
its second B-derivatives, and if is true, then 

(13.7.1) lim Pi-l log hr, < /) = f* du 

n-»co 2’T(ir) Jo 

that is, —llogXjf converges in distribution to the chi-square 
distribution C(r). 

The proof is similar to that of 13.4.2 and is left to the reader. 

Theorem 13.5.1 holds if 0 is r-dimensional. The argument is a straight¬ 
forward extension of that for 13.5.1 and is omitted. 
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In the extension of 13.6.1 to the case where d is r-dimensional, and 

fV* are tests defined in the sample space by the inequalities 

(13.7.2) -21ogA^>;t| 
and 

(13.7.3) 

where (/n(^i> • • •» ®o) is defined by (12.9.1) and where Xa is the 100a % 

point of the chi-square distribution C(r). The proof is a rather direct 
extension of that for 13.6.1 and is left to the reader. 

Theorem 13.6.2 extends without any particular difficulty to the case of 
an r-dimensional parameter and is left as an exercise for the reader. The 
r-dimensional version of PV* referred to in 13.6.2 is the region in the sample 
space in which > Xa where is defined in (12.9.15). Instead of the 
one-dimensional estimating function , ,., d) we would, of 

course, use the vector function hj,Jxi, ..., x^; 6), p = 1 ,..., r, defined 
in (12.9.1), that is, 

(13.7.4) fipnC^i, 0) = - 6). 

n 

13.8 THE LIKELIHOOD RATIO TEST OF A 
COMPOSITE HYPOTHESIS 

In problems of hypothesis testing where several parameters are involved, 
we are often interested in the case where the parameter subspace is 
an r'-dimensional Euclidean section of Q,., that is, a cylinder set consisting 
of points of form ( 0 i,..., 0^., ..., 6^), ..., ^ro having fixed 

values. Let us denote this composite hypothesis by Q,.) or more 

briefly by The likelihood ratio test for is defined in 

the usual way by (13.3.5). In large samples, the part of O,. of main interest 
is that lying within some open r-dimensional interval 12 ,^ containing the 
true parameter point ( 610 * • • •» ®ro)- 

The basic theorem [Wilks (1938a)] about the likelihood ratio test can be 
stated as follows: 

13.8.1 Suppose (a?i, ... ,xj is a sample from the c.d.f. F{x\ 6 ), where 
6 is r-dimensional, and F(x; 6) is regular in all of its second 
6-derivatives, for 6 e Then 


(13.8.1) 

lim P(-2 log <x^\0^ «>/) 

n->oo 









du 



420 


MATHEMATICAL STATISTICS 


that is, if 3^^ is true, —2 log « =■ 1. 2, ... converges in 

probability to a random variable having the chi-square distrdmtion 
. C(r - r'). 

To establish this theorem, let 6i, ..., be defined as in 12.7.2. Let 
(Ji...., 6^. be the solutions of 

(13.8.2) ..., a:,; 0) - 0, / = 1,..., r', 

with respect to 0i,..., 0^^, that is, 0^,..., 0'^ are maximum likelihood 
estimators for 0i,..., 0r'> Ihe remaining components being fixed at 
df'+i o> • • •» ®ro> respectively. Under the assumptions of 12.7.3 it follows 
that (0i,..., 0'O is asymptotically distributed, for large n, according to 

(13.8.3) II«^pvII"')> />', 9' = l,.. ., r'. 


Now referring to (12.6.1) let us adopt the following notation, where 
00 e 

/— ‘^Dn(^l» • • • » ^n» ^o) “ ^np 

y/n 

= Vnp 

(13.8.4) = v'n,. 

n 

P.€ = l. r, p',q'= I,. .. ,r’. 


Because of the regularity of F(x; 9) in all of its second 6-derivatives it 
follows that for an arbitrary e > 0, there is an n, so that the prob¬ 
ability exceeds 1 — e that alt four of the following equations hold for 
all n > /I,: 


(13.8.5) 

(13.8.6) 

(13.8.7) 

(13.8.8) 


0*1 

r' 



2 log dFiXf, do) = 2 log dF(Xf, ..., ^,) 

t“i t=i 

+ i S 

P,0 = 1 

2 log dF(Xf, Bq) = 2 log dF(*{, . &r‘, 6,-+io. • • • . ®ro) 

i-i 

+ i i 
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where 6* and Of are points in £1^) between 6q and 0, while 0*' and Of' are 

points in w,- n £1^ between 0, and (0^ . 6^, 0^^.ig,0^). 

The and can be expressed as 

ptsl 

vn,- = 2 

P'=l 

and \\A%^{6*')\\ = \\A’/^i0*')\\-'^. 

Since the left-hand sides of (13.8.7) and (13.8.8) are the same, and since 

(13.8.10) -2 log Ajr,.,. = -ll f log dF(x^, 6;,..., 0;., 0,.+io.6^) 

it follows that for all n> the probability exceeds 1 — e that the following 
equality holds 

(13.8.11) -2l0gA.«^,.„= i Al?^(dt'XnAn.- - lA^XynnvVn, 

p\q'-1 3),g=l 

where the rj^p and rj^p' are to be expressed in terms of the Cnp by equations 
(13.8.9). 

If Sq ^ ^r' is the true value of 0, we see from 12.7.1 that (5„i,..., 5nr)> 
n = 1, 2,... is a sequence of random variables converging in distribution 
to n({0}; ||5p,||). Furthermore, the matrices 1M^’J'(0*)|| and 
both converge in probability to the matrix || —Bp^\\, whereas the matrices 
and \\ApliB*')\\ both converge in probability to H-^pvll as 
« 00 , where is defined in 12.7.1. 

Carrying out the algebra on the right-hand side of (13.8.11), we find 
that as « -> 00 the two sides of (13.8.11) converge together in distribution 
to the distribution of a random variable Q where 

(13.8.12) e = 2 - 2 Bp^y^.y,. 

V,Q-1 = l 

where and where (yi, ...,yr) bas the distribution 

iV({0}, ||-ffp<,||). It can be shown, that for 6q e the distribution of Q is 
the chi-square distribution C(r — r'). Thus, since the random variable 
—2 log converges in distribution to the same distribution as g, 
we therefore conclude the argument for 13.8.1. 


(13.8.9) 

where 
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Finally, we remark without proof that under the assumptions of 13.8.1, 
the likelihood ratio test for is consistent, that is, 

limP(—2log > }^\deClf — to/) = 1. 

n-»>oo 

By using the class of regular estimating functions ®) 

mentioned in Section 12.9 rather than the score functions , 

d) in (13.8.5) through (13.8.8), we can construct a class of tests for 
which are consistent. However, none of the tests in this class is 
asymptotically more stringent than the likelihood ratio test when the notion 
of asymptotic stringency is extended in a fairly straightforward manner to 
composite hypotheses. 

PROBLEMS 

13.1 A sample (xi,..., aj„) is assumed to come from a distribution having 

p.d.f. of form >0, and O = (0, 4 -oo). Determine, by using the 

Neyman-Pearson theorem, the critical region in the sample space of 
(^ 1 * • • •. ^n) for testing the hypothesis ^(6^; 6^ u d^) where 0^ > and which 
satisfies 

/^(W'al^o) =«• 

Determine the power function of this test, that is, P( | 0) as a function of 0. 

13.2 Suppose Hi, Xi, are the size, mean, and sample variance of a sample 

from the distribution and /ig, ^ 2 » *^1 ^re similar quantities for an inde¬ 

pendent sample from the distribution iV(^ 2 » Consider the composite 
statistical hypothesis D) defined as follows: 

Q is the Euclidean half space of (mi, a^ 2 » <^*)» > 0 

CO is the subset of Cl for which //j = ;i 2 - 


Show that the likelihood ratio for testing Jf is equivalent to the Student 
ratio 

t - ~ 

sVljlti + l/»2 


where 


_ («! - 1)^1 + («g - 1)^1 
Wl + ^2 — 2 


and / has the “Student” distribution S{ni + /ig — 2) when ^ is true. Show that 
this test is unbiased. 


13.3 If the preceding problem is generalized to the case of k samples where 
if<» sf, I » 1,..., k, are the size, mean, and sample variance of k independent 
samples from a% i » 1,..., A: respectively, and Jf(eo; O) is the statistical 
hypothesis for which 

Q is the Euclidean half space of ..., /^jb, a* > 0, 

CO is the subset of O for which Mi ^ Mk 
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show that the likelihood ratio for O) is equivalent to the Snedecor 
ratio 

k 

in - k)^ni{Xi - »)* 

^ “ 1 

A: 2 («< - 

t = l 
1 ^ 

where n = /ij H-+ /i^ and x = - Y and ^ has the Snedecor distribution 

n 

S{k, n — k) '\{ \ fl) is true. Show that this test is unbiased. 


13.4 If si and j| are variances in samples of sizes rti and /I 2 from Nijii, ) and 
^ 2 ) respectively, and if Q) is the statistical hypothesis where O is the 
Euclidean quarter -/?4 space of (7|), ^l >0, > 0, and (o is the 

subset of n for which erf = show that the likelihood ratio for Jf(€o; Q) is 
equivalent to the Snedecor ratio 


which has the Snedecor distribution S(ni — 1, /I 2 — 1) if O) is true. 


13.5 Suppose rci and X 2 are independent random variables having binomial 
distributions Bi(ni,pi) and Let ^(co; Q) be the statistical hypothesis 

for which Q is the space of all possible points {px,p^ inside the unit square 
having vertices (0,0), (0, 1), (1,0), (1, 1), and co is the subset of Q for which 
Pi = P 2 - Show that the likelihood ratio for testing 3^{<a ; H) is given by 



where re = + ccg and /i = /i^ + «2 and that the limiting distribution of 

—2 log A as /ij, 712 ”** is the chi-square distribution C(l) if jr(co; Q) is true. 

13.6 Generalize the preceding problem and its solution to the case where 
^ 1 , • • •, ^A: are independent random variables from binomial distributions 
Biirix.px), . . ., Bi{nf,,pk). 

13.7 Test for independence in an r x s contingency table. Suppose 

i = 1,. .., r,y = 1,... , 5) is an (r.y — l)-dimensional random variable from the 
multinomial distribution 







where 2 T 71,,. = n and where each pi^ > 0 with 2 J Pa = L Let Jf(co; O) be 

j I i » 

the statistical hypothesis for which Q is the space of all possible values of thepij 
and (o is the subset of Cl for which p^ = piq, wherej Pi = Show that 

the likelihood ratio for Jf(cD\ O) is given by * ^ 


A = 
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and that the limiting distribution of —2 log A is chi-square C((r -- 1 )(j — 1)) if 
ft) is true. 

13.8 {Continuation) Let g be defined as follows 


where 


g 





«<• 



Show that if ^ is true^ and —2 log A converge together in distribution to the 
chi-square distribution C((r — l){s — 1)) as cx). 

13.9 Independence of layers in an r x s x t contingency table. Let 
/ = 1,..., r;y = 1,..., A: = 1,..., be an {rst — l)-dimensional 
random variable having the multinomial distribution 


naic 


-r-7^r - UUUpZf 

nnn«.«! * ^ • 


n and pi^ic > 0 with 




= 1 . 


Let Jt{io\ ft) be the hypothesis where ft is the space of all possibleand weft 
is the set for which p^i^ == Pii<Ik^ 2 2 ~ Show that the likelihood 

ratio A for testing Jtf is given by 


A 


nni 

j i 

(”"’1 
\ « ) 

r 



nnni 




•k 


where tu,. =2 and «.. * = 2 2 
* i i 


and that the limiting distribution of —2 log A as /i oo is the chi-square distri¬ 
bution C{{rs — \){t — 1)) if Jf{a); ft) is true. 

13.10 Suppose (a?!!,... and (iCgi,...»a^ 2 n,) independent samples 
from Poisson distributions Poijii) and Fo(jli^ respectively. Let Jf{(o; ft) be the 
statistical test in which ft is the space of all possible points in the first 

(juadrant of the /i 2 /i 2 -plane, whereas <o is the line /ii » mz In ft. Show that the 
likelihood ratio A for Jf{(o; ft) is given by 


A 


1*1 
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where and rfg are the means of the two samples respectively, 

^ _ wA + 

Wi 4* /Ig 


and /I =* /ii + /ig* Show that the limiting distribution of —2 log X as «i, fxi 
is the chi-square distribution C(l) if Q) is true. Generalize to k samples. 

13.11 The (k — l)-dimensional random variable (/ii,..., /i^) has the multi¬ 
nomial distribution 


where 4- * * • + /i^ = n and pi > 0 with 4 • • 4 /?* = 1. Show that the 
likelihood ratio A for the hypothesis J^ico; O) where Cl is the space of all possible 
Pi and a> is the subset for which pi = p^^, ^ • ^Pk = PkQ is given by 



where Pi = /ii//i,..., = njn and show that the limiting distribution of 

—2 log A as /f -► 00 is the chi-square distribution Cik — 1) if Jff’ico ; G) is true. 

13.12 {Continuation) Show that as w -► oo, —2 log A and Y ^ 

converge together to the chi-square distribution C{k — 1) when G) is 
true. 


13.13 If jf,. .., si are sample variances of independent samples of sizes 
/ifc respectively, from the distributions 7V(/Ui, af),..., aj) respect¬ 
ively, and if ^(‘co; Cl) is the statistical hypothesis in which Q is the 2-* part of 

Euclidean space /? 2 fc of all possible values of 0<i,...» af,..., erg), af > 0. 

ol > 0, with CO being the subset of Cl for which (t2 _ . .. _ ^2^ 

show that the 

likelihood ratio A for Jt(o ); O) is given by 


where 



R/Jfc - 1)5?1 

L "I'So J 

L J 

_ («1 - 1)5? + • • 

• + («» - l)s| 

«! 4- • 

• + Wfc 


13.14 If is the variance of a sample of size n from N{p, tr*), show that 
using the likelihood ratio for testing the hypothesis Jfio); Q) where Cl is the space 
of points {fjiy or*), cT* < org, and co is the subset in Cl for which a* = a§, is 
equivalent to using the ratio 

(n — l)j* 


which has the chi-square distribution C(n — 1) if Q) is true, and hence the 
critical region of size a for rejecting Q) is the set of values of j* for 
which 


15 ^ Xoc 


where xj is the 100a % point of the chi-square distribution C(n — \X 
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13.15 Test for hypothesis of parallel regression lines. Suppose Vn,...» ym^ 
arc independent random variables having the distributions 

» 1,..., Hi, and > ^ 2 n, &re independent random variables having the 

^tributions iV(^ 2 o + ^ 2 i^ 25 i» ^2 *“ 1* • • •»^ 2 - Let si^(cD; H) be the hypo¬ 

thesis in which Q is the Euclidean half-i^s space (^lo, j^n, P 209 P 219 
and to is the subset of Cl for which /»U « P 2 V Show that the likelihood ratio 
for ^ is equivalent to 

^ _ (^u, ■ Sc) 


SJini + n2 “ 4) 

where ^ has the Snedecor distribution 5'(1, Hi + H 2 "" 4) if ^ is true, and where 
2 

Sta 


' 2 2 - y,) - - *®)]* 

P-1 " 


•1 


Vp 


:1 ' 
fp 


jlpip* 


1 




P^P» 


5 — 

Up 

and where 


“p 

^ ^p)» 


= 1 , 2 , 






2 

■^n = 2 2 K^ifp - yj.) - ^(*. 1 , - VI* 


p=i (, 
ft _ oi + 

^ 61 + 6a' 


f are independent random variables 


13.16 If f = 1,..., r;= 1, • • 

having the distributions N(]a. + + /<.,, a*) where j y.f = 0, j /<., = 0. Let 

^ = 1 »/ —1 

Q) be the hypothesis with Cl defined as the set of all real vectors 
(p, Pi.,..., Pr., P. 1 ,. •., P.«» subject only to the conditions^ ^ P.»j = 

and to is the set in Cl for which = • • • = s 0. Show that the likelihood 
ratio for testing Jt" is equivalent to 

_ SJ(r - 1) 

S'-./Kr - 1)(^ - 1)] 


where ^ has the Snedecor distribution 5((r — 1), (r — l)(s — 1)) if Jt" is true. 
And where 

•S.0 -*)* 

s.. -2|;(*f, -«.,+*)* 

* “ JI *«”' “ p ? *«"• 

13.17 ybr equality of probability ratios in Luce^s (1959) choice behavior 
model. In this behavior model it is assumed that in choosing between two 
alternatives and A 2 the ratio of the probability of choosing A 2 to the 
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probability of choosing Ai remains constant for variations of a third choice 
alternative A^, The problem here is to construct a test for this hypothesis on the 
basis of experimental results. 

More precisely, suppose (n^, Wa*. « 3 »)* where + Wai + « 3 t = «<, i ** 1,..., 
k, are k independent sets of random variables having the trinomial distributions 


71,! 


^ . ^ 

Tiiil /la,! /I3,! 


i » 1,..., A: where pi >0,qi> 0, r, > 0, + r, = 1. Let ^(co; O) be 

the hypothesis in which Q consists of all possible qi, ri, that is, pi, qi^ are 
positive and satisfy Pi qi + r, = 1, i = 1,..., A:, whereas a> is the subset in 
n for which qjpi = t and p/1 + r) + r, = 1. Show that the likelihood ratio A 
for testing ^ is given by 

-/,(1 + ?)]M 

jfc » 

IT 

where 

't =2«2<y'2»i< 



Pi 


fin + ”«< 
B <(1 + ?) ’ 


and that for large values of /ii,..., /i^, —2 log A has as its asymptotic distribution 
the chi-square distribution with k — \ degrees of freedom if ^ is true. Also 
show that if ^ is* true the variance of ? in large samples is approximately 

+ 1 ) ycj 



CHAPTER 14 


Testing Nonparametric Statistical 
Hypotheses 


In Chapter 13, we gave a brief account of the theory of tests of parametric 
statistical hypotheses with special reference to likelihood ratio tests in large 
samples. In that theory the class of admissible c.d.f.’s was of the form 
{F(x, d) :6e Q}, that is, a class of c.d.f.’s of specified functional form, the 
members of the class corresponding to the values of a real parameter 0 
in some parameter space O, where, of course, either x or 0 or both can be 
multidimensional. 

In the theory of tests of nonparametric statistical hypotheses the class of 
admissible hypotheses is, in general, the class or a subclass of continuous 
c.d.f.’s, depending, of course, on the particular problem at hand. The 
general theory of nonparametric tests has not been as well developed as 
that for parametric tests. Therefore, in this chapter we discuss the 
theory of nonparametric statistical tests in terms of some of the more 
important problems in this field rather than attempt to set up a general 
theory of nonparametric tests. The reader interested in further literature on 
details of nonparametric tests is referred to books by Fraser (1957) and 
Kendall (1953), and also to survey articles by Kendall and Sundrum (1953), 
Moran, Whitfield, and Daniels (1950), SchefK (1943), Wolfowitz (1949), 
and Wilks (1948, 1959a). A comprehensive bibliography has been 
published by Savage (1962). 

14.1 THE QUANTILE TEST 

The simplest type of nonparametric statistical test is one for testing the 
hypothec that a sample (x^,..., x J comes from a population whose 
c.d.f. F{x) has a unique pth quantile x„ that is. 

Fix,) wmp. 

428 


(14.1.1) 
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To Specify the hypothesis precisely, let Vqj, be the class of continuous 
c.d.f.’s having as their pth quantile and let the admissible class be the 

set of all continuous c.d.f.’s. Then the hypothesis in which we are interested 
may be designated as the hypothesis that the c.d.f. F(x) of the 

population from which the sample is drawn belongs to the subclass of 
Basically, .^(‘^opJ^p) is a nonparametric composite statistica, 
hypothesis, not to be confused with a parametric composite statistical 
hypothesis ; fi). In the latter case Q is a set of elements representable 
as a set of points in a Euclidean space and co is a subset of Q, whereas in 
the case of .^(^opJ ^p) set of admissible c.d.f.’s cannot be placed 
into a bicontinuous one-to-one correspondence with the points of a set 
in a Euclidean space. Nor can this be done for 
Now let (a?!,..., a: J be a sample from a c.d.f. in Let r be the 
number of components of (ar^,.. ., a:J whose values lie on the interval 
(— 00 , aioj; r has the binomial distribution Bi(n,p). Ah intuitively 
reasonable test for ^(^op^ ^p) is that for which r belongs to one of 
the two sets of integers 

(14.1.2) (0,1,..., ri,}, {ri„ . .., n} 

where is the largest integer for which 
(14.1.3a) P(r < | F g «’„,) < 

and is the smallest integer for which 
(14.1.3fe) P(r >ri,\Fe < i«- 

It follows from 9.2.1a that, for large n, r is asymptotically distributed 
according to iV(np, np{\ — />)), from which it follows that 

(14.1.4) limP(W;lFG<g’op) = «. 

n->ao 

Approximations for and rj, for large n are 
1 5 ) n, = np- yijnpq + 0(1) 

r'i, = np + yijrm + 0(1) 

where q = \ — p, j/j, > 0 and «l>(-yjJ = ^a, where <l>(a:) is the c.d.f 
of Ar(0,1). 

Now let us examine the consistency of W^. If F{x) is any member of 
— "ifop. then r will have the binomial distribution Bi(n,p'), where 
p' p. Since, as « -> oo, rjn converges in probability to the constant p 
if F 6 (the interval {r^Jn, r[Jn) converges to the point p) and to /?' p) 

if — it is evident that 

(14.1.6) lim PiW, \Fe'if„- «^op) = 1- 

Therefore, 
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14.1.1 The test for which r belongs to one of the two sets of integers in 
(14.1.2) is consistent for testing the hypothesis 

It should be noted that the test for which r e {0,1,..., rj would 
be a lower one-tail test which would be consistent for testing any alternative 
in against any alternative in having its pih. quantile greater than x,,. 

Similarly for which r 6 {r',..., n) would be an upper one-tail test 
which would be consistent for testing any alternative in *^ 0 , against any 
alternative in ‘if, having its pih quantile less than 

Remark. When p =» O.S, the test is sometimes referred to as the sign test, 
and the hypothesis *’,) reduces to the hypothesis that («!, comes 

from a c.d.f. F(x) having ^ its median. A case of considerable practical 
importance arises when the sample components ... ,x„) are themselves 
differences of independent pairs of random variables (u^, t>i; uj, Vj;... ; u„, v„), 
that is, ic; » f = 1,..., n. The median Xq_^ in this instance is usually 

0. See Dixon and Mood (1946) for a fuller discussion of the sign test. Further 
tests concerning medians have l^n developed by Walsh (1949). 

14.2 THE NONPARAMETRIC SIMPLE STATISTICAL 
HYPOTHESIS 


(a) Preliminary Remarks 

One of the basic nonparametric statistical hypotheses arises as follows. 
Suppose a sample is from some continuous c.d.f. F{x). 

Could the sample have “reasonably” come from the specified continuous 
c.d.f. Fq{x)1 If we let be the class of continuous c.d.f.’s, it will be 
convenient to denote this hypothesis by ^^^’{Fq-, ^ and refer to it as the 
nonparametric simple statistical hypothesis. 

Ideally, we would like to devise a consistent test for ^{Ff^\ that 
is, one which would discriminate between Fg and any c.d.f. in ^ different 
from in the case of indefinitely large samples. This, however, is too 
much to ask for. The most we can do is to construct tests which are 
consistent for testing Fg against alternatives contained in various subclasses 
of^f. 

We shall consider three approaches to the problem of devising 
tests for J^^(Fg;*8). The first consists of a nonparametric composite 
hypothesis based on Pearson’s chi-square test; the second, which 
we shall call the empty cell test, is a simple nonparametric approach 
for c.d.f.*s in the class of absolutely continuous c.d.f.’s; and the third 
is a nonparametric treatment for c.d.f.’s in the class of continuous 
c.d.f.’s based on the confidence contours discussed in Sections 11.6 
and 11.7. 
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(b) A Nonpwametric Composite Hypothesis Based on Pearson’s 
Chi-Sqnare Test 

Let be the i/mth quantile of the c.d.f. F^{x), that is, *= ijm, 

I s 1,..., m, and let subclass of c.d.f.’s in V having these 

same quantiles. Let /< be the interval = 1,..., »i, with 

= — 00 ,= +00. Let be the number of components of the sample 
falling into /<. If the sample comes from any c.d.f. in ^g, then the cell 
frequencies (r^,..., r,J is an m-dimensional random variable satisfying 
ri + • ” + r„ = n, and having the (m — l)-dimensional multinomial 
distribution M(n; 1/w,..., Ifm) with p.f. given by 

(14.2.1) PC'*!.. •., r J = - - (-) . 

ri'. • • ■ r„! \m/ 


By considering m fixed for all n, we shall, at a certain sacrifice to be 
described later, replace the problem of testing \ ^ by a fairly simple 
problem of testing a nonparametric composite hypothesis by using 
Pearson’s chi-square test criterion. This test, as we shall see, is consistent 
for testing the nonparametric composite hypothesis that is, 

for testing any F{x) e against any alternative F{x) e ^ and is an 
adequate substitute for more refined nonparametric tests for H(F^\ V) 
for many practical statistical purposes. 

For fixed m let 


(14.2.2) 


en = 


-i (r,--)* 

n i=i \ ml 


and let be the test for which (rj ,... ,r^ makes 
(14.2.3) e„ > 


where ^ is chosen for each n to make the size of as close as possible 
to a when the sample is from a c.d.f. in '8’g. 

It follows from 9.3.2a that 


(14.2.4) 


limP(iri.|f 6^g) = f* 

n-*QO •'xi 


dF„ 


i(2*) = a 


where dF„_i(x^) is the p.e. of the chi-square distribution with m — 1 
degrees of freedom given by (7.8.1). 

Now consider the power of for testing .3^^('?g; '(f); that is, we wish 
to examine the value of P(1V^a | Pe Let /»< be the amount of 

probability in /< as computed from some c.d.f. Fj<[x) in — "ifg. We have 


(14.2.5) 



i = 1.m 


where, of course, the point (pi ,... ,p^ is different from (l/m, 


1/w). 
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It follows from the multidimensional extension of 9.14 that if (x^ . x^ 

is from the c.d.f. Fjjix), the random variable (rjn,..., rjri) converges 
in probability to the point (pi,.,. ,p^ as n -► oo. Now for a given n 
consider the set £„ in the sample space of (i>^,..., x J for which 

(14.2.6) 

f-i\n / 

where D* is any positive number less than the (Euclidean) squared distance 
between (pi,... ,/?^) and (1/w,..., l//w). Then it will be seen from the 
fact that consists of sample points for which 

(14.2.7) 

t*i\n m/ mn 

where lim lhat for n greater than some 

n-^oo 

En <= 

Therefore, for n > rii, we have 

(14.2.8) P(E,\Fy)<P(Wy,\F,). 

But, since (ri//i,..., rjn) converges in probability to (p^,..., p^) as 
n 00 , then for an arbitrary ^ > 0, there is an Wg such that 

P(E„ I fi) > 1 - <5 

for n > Therefore, for n > max (nj, we have 

PifVla I F,) > 1 - 6, 

that is, 

(14.2.9) lim P(in.|^'i) = l. 

n“> 00 

Thus, Wig, is consistent for testing the hypothesis i that F(x) 
belongs to against any alternative c.d.f. belonging to ^ It is 

not consistent, of course, for testing F^{x) against other alternatives in 
Summarizing, we have the following result: 

14.2.1 Let ^0 continuous c.d,f,"s [including jFo(^)] ^hich 

have the quantiles 1=5 1,..., w — 1, and let li = 

(2<i-.i)/w> Jo =—00 and Xi=+oo, be intervals 

determined by these quantiles. Let r^,..., be the numbers of 
components of a sample (x^,..., a;J falling into /i,..., Im* 
respectivelyy and let Wi^, be the test consisting of the points in the 
sample space of (x^,..., xj for which 
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where ^ is chosen to make the test as nearly of size a as 
possible for any F g Then for any (14.2.4) holds. The 

test is consistent for testing any Fe^q against any FsV — ^q. 

However, for any Fe^q (14.2.4) holds, and hence is not 
consistent for testing Fq against any alternative in 


(c) The Empty Cell Test 


The approach to the problem of testing, the nonparametric simple 
hypothesis ^ which we considered above has as its major short¬ 

coming the fact that is not a consistent test of Fq against alternatives 
in ^ 0 * "^his class can, of course, be rejjuced by taking larger values of m 
but holding m fixed as « —► oo. But, as long as m remains fixed as n -> oo 
the test is essentially a composite nonparametric statistical test which will 
not distinguish F^ from all other members ofThe question arises as to 
whether one can devise a test which will distinguish Fq from “almost all” 
alternatives in allowing both m and n to increase indefinitely. 

We shall consider a simple test which will distinguish Fq from alternatives 
in a subclass j/ of the class of absolutely continuous c.d.f.’s, that 
is, the class of c.d.f/s {F(x)} which have derivatives (p.d.f.’s) {/(a:)}, where 
of course, also belongs tos/. 

Let Sq, Si, ... ,Sn be cell frequency counts determined by the sample 
(a^i,. . ., x^, that is, the number of intervals li, i — \,... ,m containing 
0,\,... ,n components of the sample respectively. Then, if the sample 
is from any c.d.f. in the p.f. of the (degenerate) random variable 
(sq. Si, ..., s^) is readily obtainable from (14.2.1) and is seen to be given 
by the following expression 


(14.2.10) p(So. Sl. • • • . Sn) = 


ml n\ 


m^{0iy%V.y^ • • • (n!)Xi si! • • • s„! 


where, of course, Sq, Si,... ,5^ are non-negative integers satisfying the 
two conditions 


(14.2.11) 


^0 + *^1 +- H = w 

Si + 2 s 2 + • • • + nSn = n. 


By using methods similar to those by which (6.1 A) and (6.1.7) were 
obtained, we find the general factorial moment of the components of the 
random variables (5^o> • • • y ^n) to be as follows: 


(14.2.12) 

• • • jt/-!) 

_ _ mln\(m — go — gi - _ 

m"(2j)'*•• •(/! \y<im - go-gi - gnV in-gi-'^gi - ”gn )! 
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where the conditions which the non-negative integers go.g„ must 

satisfy are m - go - -g« > 0 and n - gi - 2gj-«g„ 

> 0. From this expression we can find means, variances, covariances, 
and other moments of ..., 

The empty cell testy proposed by David (1950), is based on Sq, the number 
of the intervals Ii,... ,1^ which contain no components of the sample. 
The mean and variance of Sq are found from (14.2.12) to be as follows: 


(14.2.13) 


^So) = mfl - - j 

\ m/ 

As.) - - l)(l - If + Jl - If - - If: 

\ m/ \ m/ \ m/ 


As a matter of fact by examining the structure of (14.2.10) it will be seen 
that p(sQy Si,..., Sn) is the coefficient of in the formal 

expansion of 


(14.2.14) 





tilt? 

IT 



+ 



from which it is seen that the p.f. of Sq, say p{so)y is obtained by putting 
(ij = • • • = 1 /^ = 1 in (14.2.14) and taking the coefficient of tio®t?" in the 
expansion of the resulting expression, namely, 


(14.245) '!!L + i +if+ 

m"\ 1! 2! nU 


But the coefficient of ugor" in (14.2.15) is the same as the coefficient of 
Mo'i^in 


(14.2.16) 


al 

m” 


(«„ + e" - D" 


which is the same as the coefficient of v” in <p(v), where 


(14.2.17) 9<p) = —l('”)(e* - 1)”'*«. 

But the coefficient of v" in ^v) is given by 


(14.2.18) 



Performing this differentiation and setting v = 0 we obtain 

(14.2.19) Kso) = Y . 

^ “ m%! i!(m - So - i)! 

where SQf^k,k+ 1,..., m — 1; fc »= max (0, m — n). 
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In the case of large values of m and n the following result due to David 
(1950) can be established by straightforward analysis: 

14.2.2 If (xi,... ,x^ is a sample from the continuous c.df. Ffpe), then s^ 
is asymptotically distributed, as m,n-*- oo so that njm -*■ p> 0, 
according to N(me~'‘, — er^*{\ + p)]). 

Let Ifga be the test in the sample space of (x ^,... ,xj for which s', > Jq, 

OT — l 

where Sq^ is the smallest integer for which J follows from 

•o"“*oa 

14.2.2 that for large m, n, with n pm + 0(1), 

(14.2.20) Sa. = mle-o + ^(e"" - e'^il + p))‘ + o(-)] 

L \m/J 

and that 2 iS m,n-^ oo, so that n/m -> /> > 0, 

(14.2.21) Urn P(1 Fo) = J” e- dt = a. 

The choice of is plausible since it is intuitively evident that the 
distribution of tends to be pushed to the right on the 5o-axis if the sample 
comes from any distribution in different from the one having c.d.f. Fq(xX 
We shall examine the plausibility of PFga more formally by considering 
its consistency for testing an absolutely continuous against alternatives 
in»^ the class of absolutely continuous c.d.f.’s. If /q(x) is the derivative of 
Fo(z) and /i(z) is that of any c.d.f. F^iz) in ..o/ different from Fq(z), then 
/o(z) and /i(z) will differ over a set of positive probability as computed 
from eitheryo(i*^) or /i(x). LetJi/* be the subclass ofs/ suchthatall moments 
of the ratio fi{x)lfjiz) exist with respect to Fq{z) and also with respect to 
F^{z). si* contains FJ^z), We then have the following result: 

14.2.3 The test is consistent for testing Fq{x) against any alternative 

^i(^) ^o(^) 

if m, n OO SO that n/m p > 0, 

To prove 14.2.3 it is sufficient to show that if the sample (z ^,..., a?n) 
comes from any Fi(z) ins/* — Fq then So/m converges in probability to a 
constant which exceeds e~^ as /w, /i -► oo so that n/m p > 0. 

Let z^ be a random variable which has the value 1 if /< contains no 
components of the sample and 0 otherwise, / = 1,..., m. Then 


*^0 = H-h *m- 
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Denoting \ — F^, by ^(sjm | Fj), we have 

Mf,) + + 

Vm I / m 
But 

i = l. m, 

\m' m 

where Pi is the probability on computed from fx{x). Therefore 

(14.2.22) ^(-® I Fi) = - I (1 - p,)" 

\m\ / mt=i 

_ 1 n_ , n(n — 1) y 2 _ «(w — 1)(« — 2) 
m 2!m 3!m 


•Ip?+ ••• + (-!)*= 

t 


n(n — 1) • • • (n — fc + 1) 
klm 



• + 


(- 1 )" 

m 



We can write the general term in this expression as follows: 


fe! \m/ \ m / \ m / \i =2 



“i“ ^m,n> 


where is the length of and 


- k + 


m 




Taking limits as m, n -> oo so that njm —»■ p > 0, and recalling that we 
are assuming that the moments of fi{x)lff,(x) with respect to either Fo(a:) or 
Fiix) are all finite, we find that and the limit of the expression 

(14.2.22) is 

i(-l)"7^ f” [.h{x)f dFoix) 
fc=0 k\ J-ao 


where h(x) = fi(x)lfQ(x). Hence, as w,« —► oo so that n/m p > 0, we 
have 

(14.2.23) lim I Fi) = f” dF^x). 

\m I / J-oo 

Making use of the fact that if g(x) is a non-negative random variable 
whose mean value is finite, we have 


(14.2.24) ^(g(*)) ><«'<“>*»<*» 
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with equality holding only if g(z) is a constant with probability 1, we have 

(14.2.25) J* dF„(z) > exp [- ph(x) dFo(a;)] = c"'’. 

The equality holds if and only if h(x) s C with probability 1. But this 
implies that /q(x) = /^(x) with probability 1, which in turn implies that 
Fq(x) = Fi(x), Thus ifyo(^) and fi(x) differ over a set of positive probability, 
that is, if Fi ej?/* — Fq 

(14.2.26) lim I > e"'’. 

\m I / 

To complete the argument for 14.2.3 it is sufficient to show that 
o^isQjm I Fj) 0 as /w -> 00 for any n. If we denote the covariance matrix 
of (zi,. .., Zm) by II or,.,.II, we have 

(14.2.27) o*(^|f,) i% 

\m I / = i 

I m 

= -^ 2(1 - F.)”[l - (1 - PiT] 

m- i = i 

+ 2 [(1 - Pi - PiY - (1 - F.)"(i - Pin 

m‘‘ 

Since (1 — Pi. — pif < (1 — F<)“(1 — pif, it follows that 

I Fi) < —, i (1 - Pirn - (1 - pn 

\/w I / m^i=i 

But (1 — /?,)”[! — (1 — PiT] < J, I = 1,..., m, for any n. Therefore 



from which it follows that aHsJm | -> 0 as w -> oo. The fact that 

<^(sQlm I Fj) > | Fq) = e-^ and that (J^(sjm | F^) 0 as m-^ co 

implies that 

liniF(lF2„lFi6 j?/*-Fo) = l 

as /w,« 00 so that njin p > 0, thus completing the argument for 

14.2.3. 

As a matter of fact, it has been shown by Okamoto (1952) and by 
Kitabatake (1958), that if the sample (x^,.. ., a; J comes from any distri¬ 
bution having c.d.f. F^(x) \n s/* the asymptotic distribution of Jq for 
large w, n with ?2 = pm -1- 0(1), p > 0, is 
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where 

(14.2.28) 

X* » J" [«-"»(«) _ dF^x) - p|^J* e-<”^*^hix) dFo(a:)]*. 

(<0 Confidence Cmitours as Nonparametric Tests 

In Sections 11.6 and 11.7, we discussed the problem of estimating the 
c.d.f. F{x) of a distribution from a sample (x^, from the distri¬ 

bution. The empirical c.d.f. F„(x) defined in (11.6.2), is the basic statistical 
function in those estimation problems. In dealing with the problem of 
devising a test for where is some subclass of the class of 

continuous c.d.f.*s V,F„(x) suggests itself for a major role in such a test. 

Anderson and Darling (1952), Cram6r (1928), Kimball (1947), Kolmo¬ 
gorov (19336), Malmquist (1954), Sherman (1950), Smirnov (19396), 
von Mises (1931), and others have considered tests based on F„{x). We 
refer the reader to a comparative discussion of these tests by Birnbaum 
(19536), and by Darling (1957). Of these various tests we shall discuss only 
the confidence contours of Sections 11.6 and 11.7. 

Referring to Section 11.6 let be the event in the sample space of 
(xi,..., X J for which 

(14.2.29) F„(x) < F„(x) + d 

fails to hold for all x. If d is chosen so that \ F^ = a, then JV^ is 

a test of size a. We shall examine the power of this test. More precisely, 
making use of results due to Birnbaum (1953a), we shall determine a lower 
bound for Fflfs, | F^) where Fi(x) is a member of different from Fo(x) 
and satisfying a mild condition to be stated presently. Furthermore, under 
this condition it will be shown that is a consistent test for F^ against 
alternatives in a broad subclass of V. 

We know from 11.6.1 that if (xj,..., x J is a sample from Fg{x) then we 
can write 

(14.2.30) F(Fo(x) < F„(x) + d, for all x ] F„) = F„(d), 
that is, 

(14.2.31) P(fV^ I ^o) = 1 - W 

where Pn(d) is given by formula (11.6.5). For a given level of significance a, 
and sample size n, we can find a value of d, say d'in, a), or briefly, d', from 
(11.6.5) so that F„(d') = 1 — a, thus making IFj, of size a, that is, 

(14.2.32) P{W^ I Fo) - a. 
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Now suppose Fi(x) is a c.d.f. in ^ different from FpC*). We wish to 
examine the power P(^8a | ^i) the test against the alternative Fi(x) 
where 

(14.2.33) 1 - I Fj) = F(Fo(*) < F„(x) + d', for aU x | FJ. 

But Fo(x) < F„(x) + d' holds for all x if and only if this inequality holds 
at the jumps of F„(x), that is, if and only if 

(14.2.34) Fo(x^f)) < + d\ f = 1. n 

n 

where ..., are the order statistics of the sample. But (14.2.34) 
hold if and only if 

(14.2.35) + d'), 1=1.n 

where FQ^(y) is the inverse of Fq(x). But (14.2.35) holds if and only if 

(14.2.36) F,(*(^,)< Fi[F„-‘(^ + d')], ^ = 1. n 

holds. 

Let 

(14.2.37) F,iF^\z)) = G(z) 

and make the change of variables f = 1,..., n. But we 

know from (8.7.2) that the p.e. of (y^),. .., Is 

(14.2.38) n\dya)-• • dy^„y 


Therefore the probability that (14.2.36) holds, that is, the value of the 
right-hand side of (14.2.33), is given by 


(14.2.39) 


raid') fod/nH 

"U J 

W 0 •'V(i) 


rO((n-l)/n + (i') 


Now let us assume that 


(14.2.40) l.u.b. (Fo(aT) - Fi(*)) = 6 

00<a!<+ 00 


and let x' be a value of x for which 

(14.2.41) F^{x') - F^{x') = (5. 

If k is the largest integer contained in n{FQ{x') — rf'), then 

(14.2.42) 4- d') < Fo(*') - 6 , ^ < fc -1- 1. 
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Furthermore, 

(14.2.43) + f = fc + 2....,/i. 

Replacing CRf — l)/« + </'], f = 1,..., n in the limits of the integral 
in (14.2.39) by the quantities on the right-hand sides of (14.2.42) and 

(14.2.43) does not decrease the value of the integral. The resulting integral 
with the new limits, which can be evaluated by induction, has the value 

( 14 . 2 . 44 ) 1 
where 

(14.2.45) Q = F^(x') - (5. 

Since k is the largest integer in n{F^x^ — rf'), that is, in m(Q + 5 
we have the following lower bound for the power of the test: 

(14.2.46) P{W^ \Fd>l (")e^d - 0”-^ 

We summarize as follows: 


14.2.4 For a sample of size n let be the set ofpoints in the sample space 

for which (14.2.29) fails to hold. Let Fyfx) be a member of^ such 
that (14.2.40) holds. Then P(W,^^ | Fj), the power of the test 
has the lower bound indicated in (14.2.46). 

In the case of large samples it follows from 9.2.1a that the expression 
on the right-hand side of (14.2.46) is approximately 


(14.2.47) 
where 

(14.2.48) 




dt 


Vn((3 - d^) 

Vcd-c)' 


Now d' is a function of « and a, and lim d' = 0. Therefore, if ^ > 0, the 

n—*-00 

limit of the integral (14.2.47) as « -> oo is unity, and hence if 5 > 0 
(14.2.49) lim P(W' 3 ,1 F^) = 1. 

n-*co 

Therefore, we have the following result: 

14.2.5 Let be the subclass of^ for which d > Oin (14.2.40). Then 

is a consistent test for the nonparametric simple hypothesis 
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This result means roughly that if the graph of F„(x) lies above that of 
Fi(x) somewhere, then the probability ultimately becomes 1, as « -► oo, 
that the graph of F„(x) + d' will lie below that of Ff,(x) somewhere if 
(xj, actually comes from /iCa;). 

The reader will note from (11.6.1) that if is the test defined as the 
set of points in the sample space for which 

(14.2.29a) Fo(x) > F„(x) - d' 

fails to hold for all x, then is of size a. Furthermore, if is the 
subclass of ^ for which ^' < 0 in 

(14.2.40a) g.l.b. {F^x) - F^{x)) = d' 

— C0<x< + 00 

then by argument similar to that used in establishing 14.2.5 we find that 
is a consistent test for H{Fq \ U Fq). Formulations of companion 
theorems to 14.2.4 and 14.2.5 concerning test are left to the reader. 

It now follows from 14.2.5 and from the fact that is consistent for 
testing Fq{x) against alternatives in that if is a test whose critical 
region consists of the set of points for which 

(14.2.296) |F,(x) ~ F^{x)\ < 

fails to hold for all x, d" being chosen so that is of size a, then is 
a consistent test of size a for the nonparametric simple hypothesis 
H{Fq\ U U Fq), that is, for testing Fq{x) against any alternative in 
u 


14.3 THE PROBLEM OF TWO SAMPLES 
FROM CONTINUOUS DISTRIBUTIONS 

(a) Introductory Remarks 

Suppose {x ^,. .., x^^ and (xj,..., are independent samples from 
continuous c.d.f.’s Fi(x) and F^{x), respectively, and let us denote these 
samples by and respectively. A basic problem in nonparametric 
statistical inference is to devise a test or tests of the statistical hypothesis 
that Fi(x) = FgCx) which is consistent against alternatives in various classes 
of pairs (Fi(x), F^{x)) in which Fi(x) ^ F 2 (x), on the basis of information 
provided by and 

Tests of this type have been devised by Dixon (1940), Mathisen 
(1943), Smirnov (1939a), Wald and Wolfowitz (1940), Wilks (1961) and 
others. We shall consider several of these tests in this section. 

In discussing tests of the hypothesis that Fi(x) = Fj^x), we must con¬ 
sider classes of alternatives to the assumption thatFi(x) = F 2 (x). It will 
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be convenient to let / denote the class of all pairs of continuous c.d.f.*s 
(Fx(a:), -Fa(a;)), and let be the subclass of ^ in which and are 
identical, that is, we shall denote the common c.d.f. by 

jF(a;). The most general form of a test for the hypothesis that s 
F4^ would be a test that {F^{x\ F^ix)) g against any alternative 
^ ^/o- Til® Smirnov (1939a) test to be considered in 

Section 14.3(^) is an example of such a general test which is consistent. 
The tests to be considered here are consistent for testing the hypothesis 
that {F^(x\ F^ix)) G against alternatives only in certain subclasses of 
^ — ^ 0 * If is any such subclass we shall refer to a test for testing 
(fi(a:), F^{x))e/Q against alternatives {F^{x\ FJ^x))sf^ as 

Before considering specific tests it will be convenient to discuss some 
distribution theory concerning two independent samples from identical 
and continuous c.d.f.’s. 

(b) Distribution Theory of Basic Random Variables Generated 
by the Order Statistics of Two Samples 

A simple but fundamental theorem on the order statistics of two samples 
from populations having identical and continuous c.d.f.’s can be stated as 
follows: 

14 . 3.1 If and are independent samples from identical continuous 
c,d,f*s whose order statistics are (x^),..., and ..., 

respectively, and if ordered set of the two sets 

of order statistics, then 

(14.3.1) < • • • < = l/ ("' "®). 

To verify this statement let F(x) be the common c.d.f. from which 
and are drawn. For simplicity it is sufficient to consider the probability 
of a specified “meshing” of order statistics of the two samples, say 

(14.3.2) P(x<i, < • • • < < *('„ < • • • < *;„,)). 

Let us relabel the random variables in (14.3.2) as ..., . If we 

let Vi * f(*i).= i='(*»,+»,), then the p.e. of the \'s is, ac¬ 

cording to (8.7.2), 

(14.3.3) «i! ! <4^1 • • • 

The event < • • • < occurs if and only if < • • • < The 

probability of this latter event is 

( 14 . 3 . 4 ) + "•). 
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It is now evident that (14.3.4) holds for any arrangement of the two sets 
of order statistics in (14.3.2), thus concluding the argument for 14J.1. 

As before, let and be independent samples from F^x) and 
fg(a!) respectively, where (Fi(a;), F^x)) e /q, the common c.d.f. being F{x). 
The order statistics of determine sample blocks Bj'*,.... as 

defined in Section 8.7. Let the coverages (see Section 8.7) for these blocks 
as determined by F{x) be a^,..., a„^+|, respectively. 

Let r ^,be the number of elements of 0„^ that lie in .... 
jj(^rii+i), respectively, where = «2 — '"i ~ — '’n,- It will be con¬ 

venient to refer to the random variables (r^,..., r„^_^i) as block frequencies 
which determines from We shall consider the distribution of 
these block frequencies. 

Now (uj,..., a„^, fi,..., r„j) is a mixed (ni + n 2 )-dimensional random 
variable, the a’s being continuous and the r’s discrete such that the 
conditional p.f. of the r’s for fixed values of the a’s is 


(14.3.5) 








li "1+1 

*ni + l • 


But we know from 8.7.4 that the p.e. of the u's is 


(14.3.6) Wx! dui • • • 

Thus, the joint p.f.-p.e. of the «’s and r’s is the product of the expressions 
in (14.3.5) and (14.16). To obtain the (unconditional) p.f. of the r’s we 

integrate this product over the simplex : {(«!,..., «»): "i > 0. 

a„j > 0, ai + • • • + a„j < 1}. If we use the properties of the Dirichlet 
distribution in Section 7.7, this integration yields 

(14.3.7) '/("‘ti 

as the p.f. of the Wx-dimensional random variable (r^,..., r„^) where the 
r’s are all 0 or positive integers such that r^ + • • • + r,^^ < /ig- 
By following a line of argument similar to that which yielded (14.3.7) 
it can be shown that anj^ t of the random variables (r^,..., r,^^), which we 
may take as (rf,..., r*), have the p.f. 


(14.3.8) 


/rti + «2 - rf- 

\ ni — t 



Summarizing, we have the following result: 


14.3.2 Let 0„^ and 0„^ be independent samples from identical continuous 

c.d.f.’s. Let (fj.'■n,+i) block frequencies which 

determines from Then the p.f. of (r^,..., r^^+i), subject to 
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the condition + • • • + = Kj, has the constant value 

1 / (”^ ^ ”*) possible sample points of (r ^,..., 

Furthermore, the p.f. of any t of these random variables, say 
(r*. rf), is given by (14.3.8). 


If we let (r[,..., be random variables denoting the block 

frequencies which 0„^ determines from 0„^, then (r[,..., r^,+i) has 

(14.3.9) '/{"'V') 

as its p.f. over all sample points of where, of course, 

r[ + • • • + = «i. Since the expressions in (14.3.7) and (14.3.9) have 

the same value, the p.f.’s of (r{,..., and (rj,..., therefore 
have the same constant value over the two sample spaces. 

It should be noted that the p.f. of the block frequency counts (rj,..., 
r„^) as given by (14.3.7) is identically the same as that for the different 
possible arrangements of the combined order statistics of and as 
given by (14.3.1), which, of course, is not surprising since there is a one-to- 
one correspondence between the possible values of the vector random 
variable (ri,..., and the possible arrangements of the combined 
order statistics in O- and . 

Remark. The continuous c.d.f. from which On^ and are drawn can be 
/p-dimensional. In this case we can adapt any method of cutting /c-dimensional 
sample blocks described in Section 8.7(c) andlet ..., be the resulting 

blocks formed by the tii points in Rj^ representing On^. Then let (r^,.. ., 
be the numbers of the /I 2 points in Rjc representing falling into these blocks 
respectively. Then the distribution of the random variable (r^,.. ., is 
exactly the same as that given in 14.3.2. 

Now let Sq be the number of the blocks B[^\ ..., which contain 

0 elements from Si the number which contain 1 element from 
and so on; Si ,will be called block frequency counts which 
determines from Now (jq, Ji, ..., is a multidimensional random 
variable which must satisfy the conditions 


So + Si + • • • H- Sn* = 1 

(14.3.10) + 25j + • • • + nas„, = n*- 

Since all points in the sample space of the random variable (r^. 

are equally probable, the problem of determining the p.f. of (.Tq, ..., s„^ 
is merely one of enumerating points in the sample space of the r’s satisfying 
certain conditions. After a little reflection the reader will see that the 
number of sample points of the random variable (r^,..., r„^^.i) which 
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map into a given sample point (Sq, s[, ..,, of the random variable 
(5o, .^ 1 ,..., is the coefficient of t/Jo wji • • • m*j*i;^* in the formal 
expansion of 


(I4.3.II) K + Wit’ + + • • • + 

Extracting this coefficient, dividing by ^ j, and dropping dashes, 
we obtain the following p.f. of (Sq, , s^^): 


(14.3.12) 


(»i + 1)^ /^*2 

So!si!---5,,j/ \ 


where 5 * 0 , 5i, . . ., of course, satisfy the conditions (14.3.10). Note that 
the number of s"s involved in (14.3.12) depends on If one desires the 
p.f. of only a fixed number, say (sq, Si,, .., sX one would set = 

• • • = = M in (14.3.11) and select the coefficient of Uq^u[^ • • • 

where Sq + Si + • • • + + s = + \, and s = + * * * + For 

/ = 0, this procedure yields for the p.f. of .Jq 


^1 + lW«2 

^0 / \^i 

rti + ^2 

the sample points of being k,k+ 1,..., where k = max(0, 
iti — n 2 + 1). For larger values of t the p.f. is complicated and is omitted. 
Factorial moments of one or more of the random variables (^-q, s^, . .,, 
can be obtained from the p.f. (14.3.12) by methods similar to those 
used for deriving (6.1.7). In general, we find 

(14.3.14) = (ni + 

^1 + ^2- go -2gi- -(t + \)gt\ 1(^1 + " 2 ^ 

ni- go - - gt // \ / 

where, of course, go^ • • • ^gt non-negative integers such that ^0 + * * * 
+ < «i and gi + 2g2 + • • • + tgf < Wg* From (14.3.14) one can find 

means, variances, covariances, and other moments of one or more of the 
j’s. 

Summarizing, we have the result: 




14.3.3 If and are independent samples from identical continuous 
c.d, f"s the p.f of the block frequency counts (Sq, ^ 1 ,.. ., which 
determines from is given by (14.3.12) where the components 

of (So, Si, satisfy (14.3.10). Furthermore, the general 

factorial moment of (sq, Si,..., Sj) is given by (14.3.14). 
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In particular, for s^, we have 

"i(”i + 1) 


^(So) = 


(«! + fli) 


(14.3.15) ^ ^ nf(nl - 1) ^ »i(ui + 1) 

® (hi + ngX”! + ”2 - 1) («1 + na) 

If we let = prii + 0(1), p > 0, these i educe to 


(14.3.16) 


^W=».(^ + o(;J;)) 


+ 1)^ 
(«i + n^f 


Furthermore, the following statement summarizes the situation con¬ 
cerning the asymptotic distribution of Sq in large samples: 

14.3.4 If and are independent samples from identical continuous 
c.df's, and if n^, Wg large so that n^ — pn^ + 0(1) then Sq is 

asymptotically distributed according to .» ) . 

\i + p (1 + pyj 


The proof of 14.3.4 can be accomplished by approximating all factorials 
in the expression for p{s^ in (14.3.13) by Sterling’s formula, with 


(14.3.17) 5o = 

1 + p 

from which it will be found that 


+ y 


/ 

^ (I + Pf ’ 


so(ni) i Cv" 2 

(14.3.18) lim 2 p(so) = -+ 

where .So(^i) and ^^e largest integers contained in the numbers 

obtained by respectively substituting y' and y" for y in the right-hand side 
of (14.3.17). We omit the details. 


(c) An Empty Block Test 

The first nonparametric two-sample test we shall consider is a simple 
empty block test suggested by the David one-sample empty cell test 
discussed in Section 14.2(c). The test has as its critical region the set 
of values of Sq for which Sq > sf^oL^ /ii, n^ where, sfoL^ n^ is the smallest 
integer for which 

(14.3.19) 


/>(lf^„J(Fi,f;)e/o)< «• 
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Since the p.f. of Sq, if (Fj, Fg) e that is, Fj{x) = F^ix), is given by 
(14.3.13), values of ^Q(a, for small values of and Wg can be deter¬ 
mined from this p.f. 

For large values of for which = pn^ + 0(1), it follows from 

14.3.4 that we can approximate the critical value .yQ(a, as follows: 

(14.3.20) So(«, «2) = + y, 7--^3 + 0(1) 

1 + P ^ (1 + P)*" 

where 

(14.3.21) -4= I e'*'* dt = a. 
s/Itt J ., 

Thus, it follows from 14.3.4 that if n 2 —► oo so that —► p > 0, then 

(14.3.22) lim P( W,, | (F,. e /„) = a. 

The test is plausible since it is intuitively apparent that the distribution 
of Sq tends to be pushed to the right on the s^-sixis if (F^, F 2 ) does not 
belong to 

We shall examine this plausibility more formally by examining the 
consistency of Let F^ ’(w) be the inverse of the c.d.f. Fi(a:) and let 
Jf* be the class of pairs of continuous c.d.f.'s (F^(x)y F 2 (x)) such that: 

(i) F 2 (Ff ^(w)) has a derivative ^(w) for all u on (0, 1) except possibly 
for a set of 0 probability, 

(ii) The derivatives of FgCFj '(w)) and F^(F{ ^(w)) (= u) with respect to u 
on (0, 1) differ over a set of positive probability. 

We shall prove the following result: 

14.3.5 The test is consistent for testing any (Fj, Fg) e ^^ against any 
(Fj, F 2 ) 6 as n 2 -^ 00 so that Wg = P^h + ^here p > 0. 

To prove 14.3.5 it is sufficient to show that if (Fj, Fg) e Jf*, + 0 
converges in probability to a number greater than 1/(1 -|- p) as Wj, Wg ^ 
so that njn^ -> p > 0. It will be recalled that 1/(1 4- p) is the quantity 
to which SQlin^ -f 1) converges in probability if (F^, F 2 ) e 

Let be a random variable whose value is one if sample block 
determined by contains no components of and 0 otherwise, 
f = 1,...,«!+ 1. Then 

(14.3.23) + ... + 

We now compute <^{sq) assuming and are independent samples 
respectively, from a pair of c.d.f.’s (Fj, F^ belonging to Since 

^{z,) = P(z, = 1) 
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we have, denoting <^’[Jo/(«i + 1) | (Fi, /y e /*] by + 0 | / *1 

(14.3.24) I J*] = lP(z, = 1) + • • • + = 1)]. 

V/Ii + 1 I / Ml + 1 

Now for I = 2,..., we have 


(14.3.25) P(z^ 


__ 

(f-2)!(ni- f)! 

• |* [1 — F2(*(f)) + * 

•'IS 2 


where S 2 is the region in Euclidean space for which —00 < < 

< +00. 

Expressions for P(zi = 1) and = 1) are slightly simpler than 

(14.3.25) but are such that each divided by aii + 1 has a limit of 0 as 
«i> 00 with njrti —► p > 0. 

Let us change the variables as follows 


Then (14.3.25) becomes 


Wi — ^i(^(^_l)) 

U 2 = 


(14.3.26) P{z^ = 1) = 


(f-2)!(ni-f)! 


X [1 - G(mi) + G(u2)y^ui-\1 ^ du, dU 2 

where T 2 is the triangle 0 < < M 2 < and G{u) = F 2 (Ff ^(m)). We 

assume that G(u) has a derivative g(u). Then g(u) is a probability density 
function on (0, 1). 

Thus, we have 


(14.3.27) I ^*)-I P(z, = 1) + 

\ni + 1 I / «i + 1 j=2 

Where = (n, + l)-i [P{z, = 1) + = 1)]. 

Inserting the expression for P(z^ = 1) from (14.3.26) into (14.3.27) and 
performing the summation we obtain 

(14.3.28) I /*) = X f [1 - «2 + wi]”*"** 

X/lx + 1 I / /Ji + I JTi 


[1 - G(u^ + G(mi)]"» dui dUi + 6„„„, 
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u, = V 


^2 = H-. 


Then (14.3.28) can be written 


(14.3.29) 



- 1 ) 

”i(”i + 1) •'0 •'0 



^-|nx-2x 

Hi- 


l/dv + 


Taking limits as n^, Wg —► oo so that njn^ -> p > 0, we obtain 


(14.3.30) lim S'i - ^*) = P r'”e->'ii+p<'C’)] di/ dv. 

nWj -|“ 1 I / •'0 •'0 

Performing the integration with respect to y we finally obtain 

(14.3.31) Urn g - y*) = P- — -. 

+ 1 I f •'0 1 + 

We must now show that the integral on the right-hand side of (14.3.31) 
exceeds 1/(1 -f p) if ^(y) differs from unity over a set of positive probability, 
which will be the case if (F^, Fg) e since the derivatives of F 2 (Fp‘^(M)) 
and u are assumed to differ over a set of positive probability on (0, 1). 

We note that under this condition the following strict Schwarz inequality 
holds: 

(14.3.32) P ■ P[1 + P^(«>)] dv 

•/o 1 + P^(^) •'0 

But the right-hand side of the inequality is 1 and the second integral on 
the left has the value 1 -+■ p. Therefore 


(14.3.33) 


•^0 1 + pg(v) 1 4- p 


that is to say, as n^, /ig oo with Wg = P^i + 0(1), p > 0, 


““ •^(^TTI ^*) > I • 

+ 1 I / -h 1 I / 

To complete the proof of the consistency of the empty cell test it is 
sufficient to show that the variance of + 1) for (Fj, F^e for 
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large Wg with = n^p + 0(1), is of order We have 

(14.3.34) 



1 

ni +1 

V rr 

U. +1 ^) 

(«1 + D " 

2 , 

-{.> 1=1 -1 


where is the covariance matrix of . . ., The covariance 

between and z^ can be expressed as follows: 

(14.3.35) (T,, = P(z^ = 1, = 1) - P{z^ = \)P{z^ = 1). 

We shall now show that < 0 for every S ^ rj. We know that the 
elementary coverages . . ., generated by the first sample have 
probability element given by n^l du^ • • • du^ , in (14.3.6). Denote this 
differential by dH{u), 

Let ^ 2 (^(f-i))* with a similar expression for Since 

+ • • * + = ^i(^(l)), f = 1,..., «i is a one-to-one transformation 

between the and it is evident that the random variable p^, , p^^ 

are functions of . . . , For a given first sample we have the 
following conditional probability: 

(14.3.36) P(z^ = 1, = 1 I . .., u„^) = {l-p^- 

For the unconditional probability we have 

(14.3.37) Piz, = 1, = 1) = S{\ - - p,)-^ 

where S is taken with respect to the distribution of (wj,. . . , 

Similarly, 

P(2j = 1 I Ml, ... , «„,) = (1 - pX' 

(14.3.38) pr, _ 1 I „ „ - rt _ n 

P\^Ti 1 I ^1> • • • > ^ni) \1 Pri) 

and for the unconditional probabilities P(z^ = 1) and P(z^ = 1) we nave 

(14.3.39) p(z, = 1) Piz^ = 1) = ^(1 - p,r^ • 

By methods similar to those used in setting up the expression in (14.3.26) 
for P(z^ = 1), we can set up a four-dimensional integral for 

that is, for 

m - 

and can be shown by a considerable amount of analysis (see Blum and 
Wei; (1957)), which we shall omit, that 


( 14 . 40 ) ^(1 ^ - p,r^ < m - • m - P,r^i 

But 14.3.40) is equivalent to the statement that 


(14.3.41) P{z^ = 1, = 1) - P(z^ = 1) . P(^, = 1X0. 
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Referring to (14.3.35) we therefore obtain 

(14.3.42) <0, = 1 .«„ 

which implies the following inequality from (14.3.34) 

(14.3.43) ^1/*) <- - ---"SW). 


(14.3.44) 


= P(z, = 1) - lP(z. = 1)]^ 


= = 1)[1 - P(^i = 1)] < i 


for f = 1,. . . ,+ 1. 
Thus, wc finally obtain 


(14.3.45) /*) < 77-^ 

V/ii + 1 / 4(ni - 


(14.3.46) 


4 ( 5 . - na,f ^ u^ + v^ ^ 
i-o na. np*(l + p)ciic “ 


where a. = p*l{l + p)'+^ and 


(14.3.47) 


w = 2 (s< - na<)(» - p - fe - 1) 

f«0 

_ k 


which implies that a^(—0 as —► oo which completes the 

argument for 14.3.5. + M ^ 

Blum and Weiss (1957) have considered subclasses of against which 
arbitrary block frequency counts are consistent for testing the hypoth¬ 
esis that (Fi, F 2 ) e ^q, Wilks (1961) has considered a chi-square-like 
test based on 5 - 0 , Si,. , ., Sj^ for fixed k and large samples, the critical region 
being the event in the sample space of the two samples for which 

^ ' 2/1 I \ 

t = o na^ np\i -h p)af, 

where = p7(l + P)* ^ 

u =X(^i — ”«.)(' — P — k — 1) 

1 = 0 

_ k 

V = y/p(l + p)2(S. - «".) 

i = 0 

and Xa is the 100a % point of the chi-square distribution C(k -f 1), This 
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test, as one might expect, is somewhat more powerful than the test 
based only on Sq. 

Remark. The test (or can be extended to the case of ^-dimensional 
distributions having continuous c.d.f.’s. In this case we consider On^ as a 
generator of ^-dimensional sample blocks . . . , as described in 

Section 8.7(r). The blocks are generated in a specific order depending on the 
system of ordering functions used, whereas generates points in these blocks. 
The entire configuration of numbers of points in the sequence of blocks are the 
components of the random variable (r^,. . . , where -h • • + = //g. 

The number of empty blocks is the random variable .Vq and its distribution is 
given by (14.3.13) if the c.d.f.’s of the samples are identical. The test can be 
shown to be consistent in the ^-dimensional case as in the one-dimensional case 
under certain conditions on the c.d.f.’s. 

(d) The Run Test 

If we denote each order statistic of by C and each order statistic 
of by C, then any arrangement of all the combined order statistics of 
and corresponds to a permutation of C’s, and C’s. Now 

under the conditions of 14.3.1 all the possible arrangements of 

Hi C’s and C’s are equally probable. Thus, all of the theory of runs of 

Section 6.6 apply to the runs generated by the possible sequences 

of C’s and C’s. (Note that in Section 6.6 Ni -i- is denoted by n.) 

The statistical function proposed by Wald and Wolfowitz (1940) as 
the two-sample test is total number of runs u in a sequence of C's and 
C’s (that is, in a combined sequence of the order statistics of and 
as defined in Section 6.6(a). The p.f. of u if (Ffx), Ffv)) s is given by 
(6.6.20), whereas S\u) and oHu) are given by (6.6.19). Note that the 
random variables u and are strongly negatively correlated. 

Since small values of u would tend to cast doubt on the truth of the 
hypothesis that (Fi(a:), Ffx)) e we define the critical set of values of u 
as the set for which 

(14.3.48) P{u < M(a, n^)) ; a, 

where M(a, is the largest integer for which (14.3.48) holds. Let us 

call this test its size being a approximately. 

For larger values of n^ and n^^ u is approximately normally distributed 
with the mean and variance given by (6.6.19). More precisely, Wald and 
Wolfowitz have established the following theorem: 

14.3.6 If when {Fi(x), Ffx)) 6 are allowed to increase indefinitely 

so that ~ ► p > 0 then u is asymptotically distributed according to 
ktI 4p^ni \ 

ll + p’d + pf/' 



Sec. 14.3 TESTING NONPARAMETRIC STATISTICAL HYPOTHESES 


453 


The proof of 14.3.6 is similar to that given for 9.6.3, and we shall only 
give a brief outline here, omitting the details. 

Let 


y = — 

from which it is seen that 


2«i/> \ 

1 + p' 


(1 + pf 


2pv/n] 


(14.3.49) 


'^Ps/n-tiy + Vni(l + p)) 

V(1 + pf 


Now denote by pi(u) and p^iu) the expressions for p(u) in (6.6.20) for even 
and odd values of w, respectively. 

Then for a given and given numbers we have 


(14.3.50) P(y' < y 


' vd + pV ' 


^ V(1 + pf ’ 


where denotes summation over the discrete values of y between y' and 
y" yielded by (14.3.48) for even values of w, whereas ^2 is a similar sum 
for odd values of u. 

Now consider the first sum, which may be written 


(14.3.51) 




2pni(?/ + x/wi(l + p)) 

V(l + pf 




nrpVil + pf 


Ay 


where Ay = V(1 + py/{n^p^). By using Stirling’s formula (7.6.29) for 
large factorials on all factorials in the expression for Pi(u) (that is, p(n) for 
u even), it will be found that as —> go the expression (14.3.51) 
converges to 

(14.3.52) -4= 

Similarly, it will be found that the second sum in (14.3.50) also converges 
to the integral (14.3.52) as 00 . Hence, by adding the two limits we 
obtain 

(14.3.53) lim P{y' <y< y”) = 4= I *« dt. 

nx-*oo 

for any two fixed numbers y' < y", which is the assertion of 14.3.6. 

If we let«!, Wg 00 so that lim Wg/^i = p > 0 and if ^ 

it follows from 14.3.6 that 


(14.3.54) lim P(v < M(a, Wi, prij)) = a. 

ni-*oo 
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Wald and Wolfowitz have shown that the IVf,^ test is consistent for 
testing (Fi(x), F^ix)) g against alternatives in the class referred to in 

14.3.5. 


(e) The Smirnov Test 

Let and F^^^ j^x) be the empirical c.d.f.’s determined by and 

as defined in (11.6.2). The Smirnov (1939a) statistic 

(14.3.55) = sup \F,,^(x) - F,,^(x)\ 

X 

suggests itself as a reasonable criterion for testing the hypothesis that 
(Fi(a:), F^ix)) e /o- 

More precisely let the critical region in the sample space of 

and On for which 

(14.3.56) ” 2 )) < « 

where 6(a, Wj, is chosen as the smallest number for which (14.3.56) 
holds, if(Fi(*),F,(a:))6/o. 

Smirnov has shown that if (Fi(x), F^ix)) e and if -> p > 0, 
as «!, Wg -► 00 , then 

(14.3.57) lim P (+ l/«g) = 1-1 (-l)‘e-*‘’^’. 

ni,n 2 -»oo i—~-oo 

Therefore, by taking d(a, Wj, Mg) = I/m^ + 1/Mg, where A„ satisfies 

(14.3.58) f (-l)‘e"®‘*^« = 1 - a 

t= “ 00 

we see that as n 2 -^ oo so that p > 0 

(14.3.59) lim > A. 7- + -) = a. 

ni,nf-»oo ' ^2' 

Now js consistent for testing (F^ix), F^ix)) g against alternatives 
(F^ix), F^ix)) E / — /q, that is, if F^{x) ^ F^ix), it can be shown that as 
Wi, -► 00 , with -> p > 0, 

(14.3.60) lim > A. 7- + -) = 1. 

The proof is omitted. 

Note that the random variable is a function of the block frequencies 

(r^,..., and hence for any fixed rf, P{Dn^ n^ < d) can be computed 
for small samples by counting sample points in the sample space of 

('* 1 , •, ''nj+i) for which < d and dividing by ^ j. This is 
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rather involved if ^ n^. However, if = n, D„ „ is a multiple 

of I In. In this case Massey (1951) has shown by argument similar to that 
used in establishing 11.7.1 that 

(14.3.61) P ( =1^0’ «)/ ( 7 ) 

where U(^, rj), f = 0,1,..., 2A: — 1, = 1,..., b is the number of 

sample points in the sample space of (r^ ..., r^^+i) for which ^ln(*(,)) = 
+ v - .*)/"» - ^ 2 n(*)l < kjn for X < X(,) being the rjth 

order statistic of 0„^). (7(f, rj) satisfies the difference equations 

f/(0, ri+l)== U(0, rj) + U(l, rj) 

(/(I, rj+\)= U(0, rj) + UiU rj) + U(2, rj) 


(14.3.62) 


U(2k ^2,rj+l)=U(0,rj) + -- + U(2k - 1, rj) 

U(2k ^lr}+l) = U(0,rj) + --- + U{2k - 1, rj) 

subject to the boundary conditions 

(14.3.63) C/(/, 0) = 0 I = 1 ,..., A: -r 1 
U{k, 0) = 1. 

Similar formulas can be developed for ^ in which case is a 

multiple of l//ii« 2 - Massey (1951), using such recurrence relations, has 
computed tables of 

). fe = 1,2, ...,nin 2 for Wi < Wg < 10. 

A fairly simple explicit formula for < A/ni« 2 ) for the case of 

= ^2 = M has been found by Gnedenko and Koroliuk (1951) which 
deserves mention here. 

To obtain the Gnedenko-Koroliuk result let D:^ and be random 
variables defined by 

(14.3.64) D* = sup (Fi„(*) - F 2 „(x)) 

X 

(14.3.64a) D- = inf (F,„(x) - F,„(x)). 

m 

Let the order statistics of the two samples combined be z^) < Z( 2 ) < • • • 
< Z( 2 n), and let be a random variable having value 1 /n if Z(«) is an x 
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(that is, belongs to the first sample) and — 1/n if «(,) is an x' (that is, 
belongs to the second sample) t = I,... ,2n. Let 

(14.3.65) Je = + • • • + ?«• 

Then it is seen that 

(14.3.66) = sup s„ D~ = inf s,. 

t t 


If we assign Jq = 0 and consider the graph of the points (/, 5^), / = 0, 1,..., 
2n, in the (r, j)-plane, connecting the sequence of points by line segments, 
we have a path which begins at (0, 0) and ends at (2n, 0). Note that 
between any two successive integral values of t the path either rises or 


falls. There are 


e;) 


possible paths, one path corresponding to each 


possible sequence of x’s and among the order statistics < • • * < 
z^ 2 n) the two samples. Under the assumption that (Fi(x), F^ix)) 
all of these paths are equally probable. The problem of determining the 
value of P{D:^ < kjn) therefore reduces to that of determining the number 
to paths that lie entirely below the line .s = (/: + 1)//?, and then dividing 


this number by 


(»)■ 


Similarly, the problem of finding the value of 


P{D„ n < k/n) is equivalent to determining the number of paths lying 
entirely between the lines s = ±(A: -f l)/« and dividing this number by 

Denote these lines by and L“, respectively. Consider P{D^ < 


(»)■ 


kjn) first and let us examine any path which does not lie entirely below 
the line L There is a first point A where the path meets line L+. Now, 
corresponding to the portion of the path from A to (2/?, 0) there is a 
mirror image of this path with respect to V passing from A to (2rt, 
2(k + 1)/a2). Therefore the number of paths from (0, 0) meeting the line 
and passing on to (2«, 0) is equal to the number from (0, 0) to the line 
L+ and passing along the image path to {2n, 2{k + l)/«), which is merely 

+ l) ’ number of ways n + k + \ rises and « — A: — 1 

falls can be permuted. The number of paths which do not meet line L+ 

is therefore | ) — ( . . i, and hence we have 

\n) \a 2 + A: + 1/’ 


(14.3.67) p(d: < = 1 - 


/ 2n \ 
\n + k + 1/ 

e:) 
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By considerations of symmetry it will be seen that 


(14.3.68a) p(^D- > - ^ =p(^D: 

If we let A = kjVIn, then we have 


(14.3.69) 


PQnllo: < A) = 1 - 


/ 2n_ \ 

\n + X^2n + 1/ 



Making use of Stirling’s formula (7.6.29) for large factorials we find that 


(14.3.70) 

and hence 

(14.3.71) 
Summarizing, 


( 2n_ \ 

lim 

lim P{ylnj2D^ < A) = I — , 


14.3.7 IfF^nip^) empirical c,d.f"s in two independent samples 

each of size n from identical continuous c.d.f's and if Df and D~ 
are defined as in (14.3.64) and (14.3.64fl), then P{D^ < kjn) and 
P(D~ > —kjn) have the same value^ say P*(kj\/2n)^ where 


(14.3.72) 


and 

(14.3.73) 





= 1 — e’ 


The reader should note that PJiXjVn), where P^id) is defined in (11.6.5) 
for one sample of size n, has the same limit as P*{XlVn) as w -> oo. 

Now we consider the problem of determining the value of n < 
kjn). Let be the set of paths which meet and Sq~ the set which meet 
L~. Then U Sq consists of all paths which do not lie entirely between 
L+ and L~. Let N{E) be the number of paths in any set E of paths. Then 
we have 


(14.3.74) N{S^ U So”) = N{S^) + M^o ) “ W O So^) 
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(14.3.75) N(S*) = N(So) = ^ j) • 

Now n Sq consists of the union of two sets S^' and 5f where 
consists of all paths which contain at least one segment joining L+ to L~ 
and similarly Sf consists of all paths which contain at least one segment 
joining Lr to L'^. Then we have 

(14.3.76) N{S^ n Sq) = N{S^ U ) 

= N(St) + N{S,) - N{S^, n 5r). 

By reasoning similar to that by which N{S^) and N{Sq) were determined 
we find that 

(14.3.77) = 


Defining as the set of paths which contain at least i segments joining 
L+ and L~ the first beginning from L+, with a similar definition for 
we therefore find that 

' 2n 
+ 2(fc + IV 
2n 

+ r{k + 1 )/ 

where r = [/i/(fc + 1)], the largest integer in nl{k + 1 ). 

Since NiD„ „ < kin) = P”) — NiS^ U we find by dividing by 

’ V"/ 


(14.3.78) N(S.*WS,-) = 2[(„^\"^,)-V 


+ • 


))] 


a 


that 


(14.3.79) 


=l+ 2 i(-l)' 

\ n/ »=i 


/ 2 n \ 
\n + i(k + 1 )/ 

e;) 


/ 2 n \ 

^ ^y \n + i(k+l)J 

( 2 ») 

Now let us examine < ^1^) as « oo. It will be convenient to 

denote ^ j J j by g„(i, k), and let k = W2n. Then for an 
arbitrary e > 0 , we can choose integers r' and «i(r', e) so that 


(14.3.80) 


2 I (-lyc 


e 
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(14.3.81) 



+r' 


- 1 (-l)‘^„(/,.AV2n) 

*= —r' 


e 

3 


for n > a). Applying Stirling’s formula for large factorials to 

gj^i, ^Vln), i = 0, ±\, . . ., ±r' as we did in (14.3.70), it can be 
verified that there is an n 2 (r\ a) so that for n > AigCr', e), we have 


(14.3.82) 





Combining (14.3.80), (14.3.81), and (14.3.82) we find that 


(14.3.83) 


_ OfJ ^ ^ 

P0/2D„.„<A)- 2 

t = — ao 


for n > max (n^, n^. 

Summarizing we obtain the following result: 

14.3.8 IfF^ n(x) and F 2 n(x) are empirical c.d.f's in independent samples of 
size n from identical continuous cM,f"s, then 


(14.3.84) 





2n 

+ i{k -f 1) 



where r is the largest integer in njik -h 1), ond 
(14.3.85) \\mPQ'i^D„_„<X)= f (-l)V-'*-^, 

n~* 00 i— — oQ 

where „ is defined in (14.3.55). 

The reader will remember, of course, that (14.3.85) is essentially a 
special case of (14.3.57) for samples of equal size n. It should be noted 
that PiVnD^ < 2), where is defined for one sample of size n in (11.7.1), 
has the same limit as (14.3.85), 


(f) The Mann-Whitney (Wilcoxon) Test 

We shall now discuss an important two-sample test originally proposed 
by Wilcoxon (1945) for samples of equal size, but extended to samples of 
unequal size by Mann and Whitney (1947), who also obtained certain 
asymptotic results for large samples. 

Suppose : (aj^,.. ., and : (x[, ..., x'^^) are samples from 
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continuous c.d.f.’s F^ix) and F^(pe), respectively. Let f = 1,..., /ij, 
1 }= 1,..., nj be a set of random variables defined as follows 


(14.3.86) 
and let 



if 

if 


< < 
xj > 


(14.3.87) 


n% n\ 

u = l 2 * 1 ,• 

ii = l f = l 


Thus, if (x ^,. .., x^^) and (x [,..., x'^^ are arranged in a sequence of 
increasing order of magnitude, U denotes the total number of times an x 
precedes an x\ If all of the x"s and x'"s in this sequence are assigned ranks 
from 1 to «! -|- n 2 , the sum T of the ranks of the x's is the Wilcoxon 
statistic. It can be verified that 


(14.3.88) U + T = /J 1 M 2 + 


Actually, Wilcoxon considered T only for the case of samples of equal size. 

The Mann-Whitney test is a one-sided test based on the U statistic, and 
is constructed so that the critical region of size a consists of the event 
in the sample space of the two samples for which 1/ = 0, 1,..., where 
is the largest integer for which 

(14.3.89) P(U < U, I (F„ Fa) e /«) < »• 

If we let be the p.f. of U, assuming (F^, F^ e then 

Pn^njS^ satisfies the difference equation 

(14.3.90) 

Pn^n^U) -^ P«.-ln,(t/) + —^ " «l), 

til + ^2 til 

where Fp((C^) = /’<<>(C^) = 1 if C/ = 0, and 0 if 1/ ^ 0. Using this difference 
equation Mann and Whitney (1947) have tabulated values of p„^„J^U) for 
1 < «! < «, < 8. 

Now let us consider the mean and variance of U. In order to examine 
the consistency of the Mann-Whitney test it will be convenient to determine 
the mean and variance of U for any two continuous c.d.f.’s F.ix), F^ix), 
that is, for (Fj, Fj) e 
We have, 

^{ U ) = <^(2 *i,) = 2 '^(*{,) = 


« = f “ fi(*) dF,ix). 

J—co 


(14.3.91) 
where 

(14.3.92) 
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Also, 

(14.3.93) = 

f.i;. C ^»;,C 

f7^»l »?^CO 

= Mjngfl + Min2("i — I)!? + — l)c + tiiti^ini — l)(n2 — l)a* 


where 

(14.3.94) b = f" (F^(x)f dF^(x), 

V — 00 

Therefore 


c 



(I -F2(a:))2ciFi(:r). 


(14.3.95) a\U) = A7i/72[fl + («i — \)h + («2 — l)c — {n^ 4-/^2“ 0^^]- 


If (Fj, Fg) e ^01 then a=^\,h=^c=^ \ and 
(14.3.96) i{{J) = , (T^(I/) = ± _1. ) 


Mann and Whitney have shown that if (Fj, Fg) g js asymptotically 

distributed according to A^(«i« 2/2, «j«2(^i + «2 + 1 )/ 12 ) for large and/ig- 

If we take the ratio t//«i« 2 , then 


(14.3.97) 



2 ’ 



/?! + wg 4- 1 

12/1i«2 


from which it can be verified by Chebychev’s inequality (3.3.5) that as 
A7i, Atg—> 00 , C//rti«g converges in probability to the constant also 
Ujn^n^ converges in probability to i. 

Now let us examine the consistency of the U test. Let be the class 
of pairs (Fi(x), Fg(x)) such that 
(14.3.98) a<\. 


Then it can be verified that for any (F|(x), F^ix)) g ,f~ we have b < ^ and 
c < The mean and variance of fZ/Wi^g for any pair of c.d.f.’s in 
are given by 



(14.3 99) 'j ^ + (»i - l)b + (n-i - l)c - (ni + - l)a^ 

^n^nj riiiii 

from which it is evident that as /7i, /Ag—> oo, UlriiH^ converges in proba¬ 
bility to a constant a < which implies that 


lim P(-^ 

wi,n2“*oo 'Alj/lg 


'Jl"2 


(^1. 



(14.3.100) 
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That is, 

14 . 3.9 The Mann-Whitney test is consistent for testing the hypothesis 
A ^ /"■). is, for testing (/;, g against any 
alternative (Fj, Fg) e 

If denotes the class of pairs (F^, Fg) such that a > we can inter¬ 
change F| and Fg, thus reducing this case to the one already considered. 

If the Mann-Whitney test had been defined as a two-tail test with 
critical region consisting of the integers 0,1,. .., and U[^, ..., n^n^^ 
where is the largest integer for which F(f/ < \ (Fj, Fg) g A) < 

and is the smallest integer for which F(t/ > \ (F^, Fg) e A) < 

then would be consistent for testing against any 

alternative (F^, F^ e ^ ^where ^^ is the class of all pairs of 

continuous c.d.f.’s for which a ^ Note that ^^ is not the class of 
pairs of identical continuous c.d.f.’s but that ^ c/ M ^ J' 

14.4 THE METHOD OF RANDOMIZATION 

In this section we shall consider a procedure known as the method of 
randomization for constructing tests of certain nonparametric statistical 
hypotheses. There are two types of randomization tests: component 
randomization tests, and rank randomization tests. We shall consider 
them only briefly. The reader interested in further details is referred to 
Fraser (1957), Hoeffding (1948a), Lehmann (1959), and Lehmann and 
Stein (1949). 


(a) Component Randomization 

The method of component randomization was originally proposed by 
Fisher (1926A). This method is simple in principle but technically difficult 
to apply to particular problems, except for very small samples. The 
method in its simplest form for one sample is to consider as the sample 
space the n ! points one obtains by permuting the components of the sample 
•. •, More generally, suppose {x^,.. .,xj has a p.d.f.fix ^,..., xj. 
Except for points in a set of probability 0, there are n! distinct points in 
the sample space of (xj^,. . ., xj whose coordinates are permutations 
of the components of a given sample. Let this set of nl points be denoted 
by S. Then iffoi^i,... ,xj is a p.d.f. which is symmetric in x^,, x^, 
we have the following conditional probability attached to the point 
(Xf^,. . ., x^J, where . .., is any permutation of 1, .. ., a. 


(14.4.1) 






n! 
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where S* denotes summation over all permutations ..., of 1,..., n. 
If fx{xi ,..., a;„) is an alternative p.d.f. the probability attached to 

is 

. 


If a test of size a exists which is most powerful for testing 
/o(^i» • • •» against , xj and if for simplicity we choose a to be 

a multiple of l/«!, then must be the set of «! a points in S for which 
P{W^ |/i) is a maximum. Such a test exists and is unique if the values of 

at the n\ points of S are all different, or more generally, if there is a 
single point in S such that the value of at that point is not greater than 
the values of «! a — 1 points in S and is greater than the values of p^ 
at all other points in S. In order for such a test to be effective 
it must have power for discriminating against whatever sample 
point (xj,.. . , a: J may occur, where the sample components are all 
different. 

It should be noted that has no power for testing . . . , a:„) 
against fi{x^, . . ., if . .., a: J is also symmetric in aj^,. .., a:„. 

In the preceding remarks we have, for the sake of simplicity, discussed 
the ideas of component randomization for the case of a single sample 
(a^i,.. ., a; J. The most interesting and important extensions of these ideas 
are for problems of two or more samples, experimental designs, and 
multidimensional samples. In all these problems, however, there is con¬ 
siderable technical difficulty in the strict application of the method of 
component randomization, except for very small sample sizes. The 
difficulty is essentially that of evaluating the probability function p^ over 
the permutations of the sample components under the alternative to 
the null hypothesis, and of selecting the required set of aw! points in S for 
the test W^. In order to make progress those who have utilized the method 
of component randomization in constructing nonparametric tests have 
relaxed from a strict application of component randomization principles 
and have borrowed test functions gix^,,,, from parametric testing 
theory to use for determining a critical region W^. They have then 
investigated moments and large sample properties of the distribution 
functions of these test functions over the space of permutations S of the 
sample components under the null hypothesis and in some cases under 
the alternative hypothesis. A critical region in the space of permuta¬ 
tions S is then taken as the set of permutation points which yield values 
of g in some interval or set. 

It is not feasible to consider all the useful component randomization 
tests that have been devised. It will be sufficient to give two examples. 
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(b) Examples of Component Randomization Tests 

The earliest component randomization test, known as the Fisher-Pitman 
test, is a two-sample test designed to test the hypothesis that two samples 
are from distributions having identical means against alternatives in which 
the means are unequal. The test was originally treated in a specific 
problem by Fisher (19266); it was investigated more generally by Pitman 
(19376). 

Suppose : (aji,..., and ..., x'^^ are two independent 

samples from continuous c.d.f.’s. Let the two samples be pooled together 
and let an arbitrary permutation of the components of the two 

samples be denoted by yj,..., Let y, s^ be the sample mean and 

variance of (yj,..., y„^) and y', s'^ the sample mean and variance of 
(yni+i» • • • > yn,+n^)- The tcst function proposed is 


(14.4.3) 


^(yi> • • • > yni-^n^ 


_ - y'f _ 

(«1 + «2)[('ll - ly + («2 - + niW2(y - y'? 


Note that giy ^,..., Vn^+n^ is of form filQii + Wg — 2 4- ^^) where t is the 
Student / ratio for the hypothesis that the two samples are from identical 
normal distributions, given that they came from two normal distributions 
with equal variances. 

Pitman has determined the first four moments of the conditional 
distribution of giy^, ..., y^.+n) given yi,..., y^^+n,* over the set of all 
permutations of the -4- «2 2/’s. Since the denominator of g is constant 
over all permutations of the y’s, the only part of g which has to be con¬ 
sidered is the numerator which has at most ” 2 ) different values 

\ '^1 / 

corresponding to the number of different ways the -f /ig sample com¬ 
ponents can be separated into two sets, one consisting of components 
and the other consisting of /Z 2 components. Pitman has shown that the 
distribution of g over these permutations has approximately the beta 
distribution Be{i, i(ni + /I 2 ““ 2)). In case the two samples come from 
identical normal distributions, the unconditional distribution ofg^ is exactly 
Be(i, + n 2 “■ 2)). Finally, we remark that the critical region 
consists of all permutations which yield values of g for which P{g >ga> = 
a where g^ would be approximated as the upper 100a % point of the beta 
distribution Be(i, -h ^2 — 2)). The difference between the means of 
and would be judged significant at the 100a % level of significance 
if j . ^ng) > goi- Wald and Wolfowitz (1944) have 

considered the asymptotic distribution of g for large and and have 
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given a proof which is equivalent to showing that as given by the beta 
distribution Bed, 2)) is asymptotically correct for large values 

of + « 2 - 

As a second example of a component randomization test we consider 
the standard Model 1 analysis of variance test for testing the hypothesis 
that column effects arc 0 in an experimental layout of r rows and s columns 
(see Section 10.6) under the assumption that the observations in each row 
constitute a sample of size s from a population having a continuous c.d.f., 
the c.d.f.’s corresponding to rows being identical except for means. If 
f = 1, . . . , r/ = 1, . . . , 5) is the sample the test function con¬ 
sidered by Welch (1937) and by Pitman (1938) is 


(14.4.4) 




2 

__ 

2 ^ n + 5 ^)'* + 2 

s', n S-. n 


where 2 denotes summation for f = 1,. . . , /% ?/ = 1, . . ., s; Xt., 

being the means of the |th row and /;th column respectively, in 
whereas .r is the mean of all rs .r's in the sample. Pitman, using the method 
of component randomization, determined the first four moments of g 
under all permutations of the .r’s within rows and showed that its dis¬ 
tribution could be satisfactorily approximated by the beta distribution 
— 1), l(r — l)(.s'— 1)), which is the exact distribution of in 
normal theory under the assumption that column effects are 0 as set forth 
in Section 10.6(^). The critical region li\ of points in the space of 
permutations on which this test is based consists of all permutation points 
for which P(g > = a, where g^ is the upper l()0a% tail of the beta 

distribution given above. 

Arnold (1958) has extended this randomization test to the case where 
is a vector and s = 2. In this case the test is equivalent to Hotelling’s 
for two samples as defined and discussed in Chapter 18. Welch (1937) 
has also used the component randomization method for investigating 
the Model I analysis of variance test for Latin Squares. 

Hoeffding (1952) has made a study of the power of component random¬ 
ization tests for large samples, which includes the two examples given 
above, and he has shown that they are asymptotically as powerful for 
large samples as the corresponding parametric tests. 


(c) Rank Randomization Tests 

The method of component randomization is not strictly a distribution- 
free procedure, since test functions and their distribution theory under 
component randomization, depend on the values of the components in 
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an observed or realized sample. However, the property of being distri¬ 
bution-free can be achieved by using rank randomization tests. In rank 
randomization tests, the ranks of sample components are used rather than 
the components themselves. Thus, if a:j,..., are components of an 
n-dimensional random variable from a continuous c.d.f. F{xi, the 

remks a^,..., a„ of z^,..,, x„, respectively, are the subscripts of the x's 
when placed in increasing order. That is, Of is the subscript of the |th 
smallest x in (xj, ..., *„). Thus, a^,a„ is a permutation of 1, ..., 
Let the space of all n! permutations of 1,... , n be S'. Every point in the 
sample space Ji„ of (scj, ...,*„) for which x^^ < • ‘ • < X(^ maps into the 
point (ii,i„) of the space S'. If (* 1 , has a c.d.f. Foix^, ...,*„) 

which is symmetric in the a:’s then we have 

(14.4.5.) = 

at each point in S\ As the reader will recall from (14.4.1)• • • > I ‘^) 
is a conditional probability distribution for a fixed (xj,. . ., xj. The 
corresponding distribution /?o(li,..., f„) of ranks, however, is not a 
conditional distribution. If F^ix^y . .. , xjisanyn-dimensional continuous 
c.d.f. the probability distribution ..., is given by 

(14.4.6) , fn) = f dF^{x^y . . . , X,) 

where is the region for which 

The problem of constructing a critical region for testing the c.d.f. Fq 

against the c.d.f. F^ from points in S' is entirely similar to that of construct¬ 
ing a critical region for testing the p.d.f. against the p.d.f. from 
points in S which has been discussed earlier. In the case of IV^y however, 
the test does not depend on the observed point (xj,.. ., x„) as in the case 
of W^. In this sense is a more truly nonparametric test than It 
should be noted that a rank randomization test has no power for discrimi¬ 
nating Fq from Fi if both c.d.f.’s are symmetric in x^,..., x^. 

As in component randomization tests there is considerable technical 
difficulty in the strict application of the principle of rank randomization 
to the determination of In order to make progress in rank random¬ 
ization test theory it has been found necessary to use test functions 
h(aiy ..., a„) defined at all points in S' suggested by analogous parametric 
testing problems. 

Although a considerable number of rank randomization tests have been 
developed, we shall consider only three examples. 
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(d) Examples of Rank Randomization Tests 

The Wilcoxon lest T referred to in (14.3.88) is a rank randomization 
test for testing the hypothesis Hi) where (the null hypothesis) 

is the class of continuous and symmetric («i + n 2 )-dimensional c.d.f.’s of 
form F(xi) • • • F(x„^)F(x\) ■ ■ ■ F(x'^J, whereas .5^ is the class of («i + 112 )- 
dimensional continuous c.d.f.’s of form F^ix^) • ■ • Fj(a;„^)F 2 (xj) • • • F 2 (x'„^), 
with a < a being defined in (14.3.92). Denoting . .., x^^, x[,..., 
K, by • • •. 2/«, I with ai,u . a„_, as the ranks of the y's, the 

Wilcoxon statistic T is simply h(ai,... ,a„^. „J where 

(14.4.7) h(ai, + „,) = i; a,. 

1 

Another example of a randomization test based on a rank statistic is the 
rank correlation coefficient. Let (.rj, ?/j; 0 : 3 , ?/ 2 j • • • ^ Ifn) sample of 
size n from a population having a continuous c.d.f. F(x, y). Let ^ 7 ,, . . . , 
be the ranks of , x„ and the ranks of . . . , The 

rank statistic . .. , . b,^) used here is the ordinary correlation 

coefficient R between the two sets of ranks. The hypothesis to be tested 
by R is where is the class of continuous 2 / 7 -dimensional 

c.d.f.’s of form F(x^, y^ • • • F{x^, yj whereas .c/ (the null hypothesis) is 
the subclass of for which F{x^, v/t) = F^(xt)F 2 (yt), f = 

F^{x) and fy/y) being continuous c.d.f.’s. Hotelling and Pabst (1936) 
investigated the distribution of R when the null hypothesis is true, that is, 
when the distribution of (.r^, y/j; . . . : yj belongs to .o/, and showed 

that \ nR is asymptotically distributed according to /V(0, 1) for large n. 
Olds (1938) tabulated the exact distribution of a quantity S related to R 
by the equation 

(14.4.8) /? = I - 65/(/7» - n). 

If the critical region of the R test is taken as the set in sample space 
for which R^ : - R’l where /?“ is the smallest number for which P(R^ > 7?^) 
< a, HoefTding(1948/7) showed that the test is not consistent for testing any 
alternative in eo/ against any alternative in 

Hoeffding (1948Z?) has devised a randomization test based on ranks 
for testing fJi) as defined above. His statistic is 


(14.4.9) 


D = 


in - 5) 


- [/I — 2(77 — 2)B -f (77 — 2)(77 


- 3)C] 


77! 
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where 


^ = 2 - l)(6f - 1) 


f=i 


(14.4.10) 


2^(af- l)(i>f- l)Cf 


C = 2 - 1) 

1=1 

where is the number of sample pairs (x^, y^, (x^, y,^) for which x^ < X{ 
and y, < y^. 

D is an unbiased estimator for the quantity 

(14.4.11) A = j I [F(x, y) — f(x, qo)F(oo, y)f dF{x, y) 
with variance, under the null hypothesis, given by 


(14.4.12) 


A/)) = 


2{r? + 5n - 32) 


8100rt(rt - l)(n - 2){n - 3) 


Hoeffding has determined a\D) under the hypothesis that the 2n- 
dimensional distribution is in — . 5 /. 

Important rank randomization tests for testing various hypotheses in 
the analysis of variance have been devised by Friedman (1937), Kendall 
and Smith (1939), Kruskal (1952), Wallis (1939), and Wormleighton 
(1959). Andrews (1954) has studied the power of some of these tests in 
large samples. 


PROBLEMS 

14.1 Suppose ^^0 is the class of all continuous c.d.f.’s having quantiles 

Pi^ 92 ,. . . at Xo 2 ,. . ., where 0 < • - < = I 

respectively. Let ^ be the class of all continuous c.d.f.'s. Let /i,. . ., Ik+i be the 
intervals ( — 00 , 0 ^ 0 J, (^oi» ^ 02 ]* • • •» +^) respectively. In a sample of 

size n let Wj,. . ., (^i + • • -h = n) be the numbers of sample com¬ 

ponents falling into /i,. . ., 4+1 respectively. Show that for large n the set of 
points in the sample space of (/i^,.. ., for which 

[Wt - n{pi-pi_^)]^ ^ 2 

n{pt-pi_i) “ 

is a consistent test of size a for testing the hypothesis '^ ) and P(x^ > xl,a) =* 
a, where x^ is a random variable having the chi-square distribution C{k). 

14.2 Let be the class of all /i-dimensional c.d.f.’s of form Fiix^F^ix^ • • • 

^n(^n) where = • • = FJxq^ = p. Let be the class of all /i- 

dimensional c.d.f.’s of form Fy{x^Fj<x^ • • • where F^ix^^ = p^^ 

{ ■» 1,..., n andlet(pi + • • • -t- pn)ln -*p' ^ as n -► qo. Let r be the number 
of components of the n-dimensional random variable (x^,... x^) which lie on 
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( — oo, and let be defined as in (14.1.2). Show that as « -► oo, is a 
consistent test of size a for testing '^p)* 

14.3 Verify formula (14.2.12). 

14.4 Show from (14.2.12) that 

"-(S) 

as m, n CO so that njm -> p : • 0 . 

14.5 Show that the random variable .Vj defined in Section 14.2(f) has 

Ninipc ^ nipe~P{\ — p p — pe P)) 

as its asymptotic distribution for large m, n when n = pm + 0 ( 1 ), p ;> 0. 

14.6 Show that the coefficient of in (14.2.15) is identical with the 
coefficient of in (14.2.16). 

14.7 Prove (14.2.24). 

14.8 Show that if (rf, . . . , rf) are any t of the random variables (r^, . . ., 
having the p.f. (14.3.7), the p.f. of (rf, . . . , rf) is given by (14.3.8). 

14.9 Verify (14.3.13). 

14.10 Verify (14.3.14). 

14.11 From (14.3.14) show that 


lim ^ 



(1 +p)*^^ 


as «2 so that p -0. 

14.12 {Continuation). Under the same limiting conditions show that 


lim S 




[1 +ft,-(w)]'^i 


du. 


14.13 Jn 14.3.1 let be the number of .r’s which exceed c^,,and mg the number 
of .r'’s which exceed Show that the p.f. of (mj, mg) is given by 


/7(mi, mg) 



where and /Wg are non-negative integers such that nii + mg = + /I 2 

[Mood (1950)]. 

14.14 Suppose ,/j is the class of all pairs of continuous c.d.f.’s of form 
(F(:r), F{x — ())) where d is a constant, and let ,/ q ^ l^e class of all pairs of 
identical continuous c.d.f.’s. Show that the Mann-Whitney test is consistent 
for testing any element of /q against any element of for which ^ > 0 . 
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14.15 {Continuation) If ...»and (x^,, . . ., are the order 
statistics of independent samples of sizes ni and Wg from F{x) and F{x — (5) 
respectively, show that 


/>(a;(r, - < i) 






and hence that if r < j and r' > s', is a lOOy % confidence 

interval for d where 


y = 1 




[Mood (1950)]. 

14.16 Verify (14.3.78). 

14.17 Verify (14.3.88). 

14.18 Establish the difference equation (14.3.90). 

14.19 Verify (14.3.93). 

14.20 Show that ^{ D) = A where D is defined by (14.4.9) and A by (14.4.11). 

14.21 BertramVs (1887) ballot problem. Suppose there arc Ny ballots in a box 
marked for A, and A ^2 iriarked for B, where N, No. Suppose the ballots are 
drawn out one at a time, and let be a random variable denoting the number of 
votes (of A minus the number for B when / ballots have been drawn and counted. 


//v 4- nA 

If the ( ^ yy difierent possible (distinct) orders of drawing out all ballots 

are assigned equal probabilities, show (by methods similar to those used in 
Section 14.3(i4) that the probability is (Nj — N 2 )/(N, 4- N.,) that A leads B 
(that is, that //j, . . . , //,v, are all • 0 ) throughout the counting. 


14.22 {Continuation). Suppose — N-y = N, in which case //oy = 0, and 


consider the 



polygonal paths reaching from (0, 0) to (2N, 0) obtained by 


consecutively connecting the IN 4- 1 points (0, 0), (1, //A, . . . , (2N, 0) in the 
xy-plane. Note that there can be only an even number of segments of any path 
lying above (or below) the j^-axis. Show that the number of paths having In 
segments lying above the a:-axis (and IN — In below) is 




and hence that the probability that a fraction njN of a path will lie above the 
a*-axis is given by 


Psin) 


I 

N 4-1 


, w = 0, 1.N 


if all paths are assigned equal probabilities. 
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14.23 Long leads in coin-tossing. [Chung and Feller (1949)]. If a coin is 
tossed 2N times, let 2/* be a random variable denoting the number of heads 
minus the number of tails in the first / tosses, and consider the 2^^ possible 
polygonal paths connecting the points (0, 0), (1, 2 / 1 ),.. . {IN, 2 / 2 ^^). Show that 
the number of paths having In segments above the a;-axis (and, of course, 
IN — In below) is 


And hence the probability that the fraction njN of a path lies above the ar-axis is 

(?)f: .- 

all paths being assigned equal probabilities (that is, assuming 2N independent 
throws of a true coin). 

14.24 {Continuation). The (first) arc sine law [Chung and Feller (1949)]. By 
using Stirling’s formula for large factorials show that the limiting c.d.f. G{u) of 

the random variable ^ as —► 00 is given by 

2 - 
G{u) = - sin ^ V«. 

TT 

Note that the p.d.f. of this limiting c.d.f. is given by g{u) = l/[7rV'w(l — u)] 
which is a fZ-shaped density function, with infinite density at the ends of the 
interval (0, 1), thus indicating greater likelihood that either relatively large 
fractions or relatively small fractions of a path lie above the a;-axis rather than 
fractions near | as intuition tends to suggest. 



CHAPTER 15 


Sequential Statistical Analysis 


15.1 INTRODUCTORY REMARKS 

In all of the theory of sampling, estimation, and testing of statistical 
hypotheses which has been discussed in Chapters 8 to 14, the size of the 
sample has been held fixed. All of the theory has involved the determination 
of sampling distributions of certain statistics, that is, functions of the 
sample components for a given sample size. The limiting forms of some of 
these distributions as the sample size increases indefinitely have been 
determined. In virtually none of this theory have we considered situations 
in which the sample size is itself a random variable. However, in Chapters 
6 and 7 we discussed several waiting-time distributions in which the number 
of trials required to achieve some objective is a random variable. The 
situations leading to these waiting-time distributions arc examples of 
simple sequential-like statistical procedures. This chapter is devoted 
mainly to tests of statistical hypotheses, although some results concerning 
sequential estimation are given. 

The results we shall consider are the basic ones pertaining to the testing 
of simple hypotheses. Some results concerning the testing of composite 
hypotheses will be found in Wald’s book (1947a), and more recent results 
will be found in papers by Barnard (1952), and Cox (1952). Results on 
sequential Mests have been obtained by David and Kruskal (1956), and 
Rushton (1950, 1952). 

The basic idea of a sequential test originated with Dodge and Romig 
(1929) who devised a double sampling scheme for deciding whether to 
accept or reject a lot of N articles containing an unknown number of 
defectives NQ, where 6 is a multiple of XjN on [0, 1]. Under the Dodge- 
Romig scheme a sample of size is drawn from the lot and one of three 
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alternative decisions is made, depending as follows on the number 
of defectives found in the sample: 

(i) Accept the lot.if < q. 

(ii) Reject the lot if Wj > c^- 

(iii) Draw a second sample of size if Ci < < c^. 

If decision (iii) is taken, that is, if a second sample of size «2 is drawn, 
then one of two decisions is made depending on the number of defective 
articles in the second sample: 

(i) Accept lot if 

(ii) Reject lot is /Wj + AW 2 > ^’ 2 - 

A double sampling plan is completely determined by specifying the four 
numbers n^, The total sample size is clearly a random variable n 

which can take on two values, namely, and A thorough 

treatment of both single and double sampling plans is given in the book by 
Dodge and Romig (1959). 

To see that a double sampling plan together with the decisions associated 
with its operation essentially constitute a statistical test we merely have to 
note that rejection of the lot occurs if and only if the two-dimensional 
random variable (/Wj, falls in a critical region W defined as U 
where E^ is the set of points in the sample space of (awj, for which 
mi > C 2 and E^ is the set for which Cj < < C 2 and mi + m 2 > 

P(W\ 0) the probability of rejection of the lot if its fraction of defectives 
is 0, can be written with no particular difficulty in terms of probabilities 
given by hypergeometric distributions. P(W\ 0), as a function of 0 is, 
of course, the power function of the “test” W. If we let 

m) = p{w\o), 

then L(0) is the probability of accepting a lot having fraction defective 0. 
Using the notation and terminology in Historical Remark of Section 13.1, 
we say that the lot is in the zone of preference if 0 < 0o, in the zone of 
indifference if 0 q < 0 < 0^ and in the zone of rejection if 0 > Oj. For a 
given producer’s risk a (risk of Type I error) at 0 = 0 q and consumer’s 
risk jS (risk of Type II error) at 0 = 0^, we then have 

L(0o) = 1 - a, L(0i) = p 

where, in practical applications, values of a and p are usually taken in the 
interval [0.01, 0.10]. The operating characteristic curve, that is, the graph 
of L(0) as a function of 0, passes through the points (0, 1), (0o, 1 — a), 
( 01 , P), and (1,0). 

The idea of a more truly sequential procedure is due to Bartky (1943) 
who extended the notion of double sampling to multiple sampling for the 
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case of an infinitely large lot having an unknown fraction of defectives 6. 
Other early examples of mathematical studies of multistage procedures 
include Mahalanobis’ (1940) series of sample censuses of jute acreage in 
Bengal, and Hotelling’s (1941) notion of a series of experiments to deter¬ 
mine the maximum of a regression function, and stochastic approximation 
methods by Dixon and Mood (1948). It remained largely for Wald (1945) 
however, to take the major step toward a general theory of sequential 
analysis as we know it today. In this chapter we shall present only a brief 
account of some of the basic results in sequential analysis due mainly to 
Wald. The reader interested in further details about the more classical 
results should consult Wald’s (1947a) book. For more recent mathemati¬ 
cal investigations of sequential-like procedures the reader is referred to 
papers by Dvoretzky, Kiefer and Wolfowitz (1953a, 19536), Kiefer (1948, 
1953, 1957), Kiefer and Wolfowitz (1952), Robbins (1952), Robbins and 
Monro (1951), and others. 

15.2 THE BASIC STRUCTURE OF A SEQUENTIAL TEST 

(a) Description of Events in a Sequential Test 

The Dodge-Romig double sampling scheme described above suggests 
the essential structure of a general sequential test. Suppose we have a 
stochastic process 0 C 2 ,...) which depends on a real parameter 0. This 
means that every finite set of the components x^,.,. has a c.d.f. 
depending on d whose parameter space is £2. Thus, (%,..., a:„) has a c.d.f. 

F(x ^,..., 6), for « = 1, 2,_ We shall first consider the simple 

hypothesis £2) (more briefly J^q) that 0 = 0 q. Let ..., Rf\ 

be sample spaces of ..., x^ respectively, and let R^ = R^^^ x ■ • • x Rf^^ 
be the sample space of (x^, ..., x^), « = 1, 2,.. .. It will also be con¬ 
venient to use R^ where R^ = Rf^^ x R^i^ x • • •. 

For each positive integer n let (7°, G^, G„ be disjoint events in such 
that 

(15.2.1) g: u c; U X . 

We begin the sequential experiment by drawing and adopt the 
following rules for stopping or continuing the experiment: 

If 6 Gl accept Jf’o without further drawings 

(15.2.2) Xi e G'l reject without further drawings 
*1 e Gi draw x^; 

if (xj, Xg) 6 G 2 accept without further drawings 

(15.2.3) (x^, Xj) 6 Gg reject g without further drawings 
(xi, Xg) e Gg draw Xg 
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and, in general, if 

{xy ,. . ., xj 6 accept without further drawings 

(15.2.4) (xj^,, , , ,xj E G'j reject without further drawings 

(a:i, . . . , a; J 6 draw 

AZ = 1, 2,- 

It is convenient to call Gi, • . • the sequence of experiment continuation 
events. 

Note that G^, ..., i, G^,..., G^_i are cylinder events m R^^, and 
G°, G^, G^ are events in such that the 2n 4- 1 events are disjoint and 

(15.2.5) G^ U • • • U g: U g; U • • • U G; U G,, = /?,,. 

However, more conveniently, these same events, for any positive integer n, 
are disjoint cylinder events in whose union is R^ itself. 

The events G|, . . . , G°, G^, ..., G^, G,^, n = 1, 2,. . ., together with 
the decision procedure defined in (15.2.2), (15.2.3), and (15.2.4) define a 
sequential test S for y. The critical region for rejecting is 
G'l U Go U • • • where the G’s. are chosen so that P( | 6^^) = cl. The 
procedure for carrying out a sequential test step-by-step will be referred 
to as the sequential process for S. 

It should be observed that any test for based on a sample of fixed 
size n and having rejection region W in the sample space can be viewed as 
what might be called a degenerate sequential test in which G^,. . ., G°_i, 
G [,..., G'j_ 1 , Gf^ are null sets while G^ = JV and G° = PF. 

(b) Probabilities in a Sequential Test 

The probabilities of accepting rejecting and drawing x„^j^ 
upon the drawing of .r^,, are 

(15.2.6) P(G°Jd), F(G^e), F(G„lO) 

respectively. As we have already stated the events G°, . . ., G°, G [,..., 
G'p G„ are disjoint and their probabilities are functions of the parameter 0. 
Let us set 

LJd) = P(C° 1 0) + • • • + P(G° 1 0) 

(15.2.7) M,Xd) = F(G; 1 0) + • • • + P(c; j 9) 

N„(9) = P(G„ 1 0) 

and 

L(0) = lim L„(0) = P(G° | 0) 

n-*oo 

M(0) = lim M„(0) = P(G' | 0) 

w-»ao 

Nid) = lim 1V„(0) 


(15.2.8) 
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where G° = Gj U G 2 U ..., and G' = G^ U G 2 vj .... These limits 
exist since L^(0), MJfi), NJfi) are non-negative with sum 1 and since 
L„(6) and M„(0) are nondecreasing as w = 1,2,.... L(0') and Af(0') are 
the probabilities of accepting and of rejecting if the true value of 0 
in the population is d\ L{B)^ as a function of 0, is theoperatingcharacteristic 
function of the sequential test S. N{d) is the probability that the sequential 
process for S continues indefinitely. N{6) = 0 means that the sequential 
process terminates with probability 1. It is evident that 

(15.2.9) Ud) + MiB) + N{B) = 1. 

If we let G° U G^ G*, and put 

(15.2.10) p{n\B)^P{Gt\B) 

then p{n | 6) is the probability that the sequential process for S terminates 
(with a decision to accept or reject J^q) with n trials, if B is the true 
value of the parameter. 

If the sequential process for S terminates with probability 1, that is, 
if N{B) = 0, then the average number of trials required to terminate the 
process is 

(15.2.11) S{n\B)^^tpit\B) 

< = i 

which, of course, is a function of 0. S{n | B) is commonly referred to in 
sequential analysis as the average sample number. In the case of a degener¬ 
ate sequential test, that is, a test based on a sample of fixed size «, we have, 
of course 

S{n \B) = n. 

It will be noted that since 

m 

S{,n \e)>2 tpit\e) + mF(G„ I 6) 

< = 1 

w = 1, 2,..., it is necessary for lim rnP(G„ | 6) = 0, that is, for iV(6) = 0, 

m-*oo 

in order for i^(n | B) to be finite. 

For the case of a simple random sampling process from a distri¬ 
bution having a finite mean ^(x), the following theorem concerning 
^(/i), due to Wald (1945) and Blackwell (1946), will be useful in later 
sections: 

15 . 2.1 Suppose • • •) ^ sequence of independent random variables 

from a distribution with finite mean ^(»), and let S be a sequential 
test for which ^(n) is finite. If + X 2 + ••• + denotes the 
sum of x"s drawn until S terminates^ then 

(15.2.12) ^(a?i + a?2 H-b ^n) ® 
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To prove 15.2.1 we note that 

(15.2.13) + X 2 + ••• + x„) = ^(xi) + ^(*2 1 xi e G^) ■ P(Gj) 

+ <^(Tal(xi,x^)eG^)-P(G2) + --- 

where Gy, Gg ,... are defined in (15.2.4). Since x^, *2. • • • are mutually 
independent, and have mean values all equal to (f(x) it is evident that 
(15.2.13) reduces to 


( 15 . 2 . 14 ) ^(*1 + *2 + • • • + x„) = + P(Gi) + P{G^) + •••]. 

As before let = C7° U n = 1 , 2 .Then since Gf,..., G*, G„ 

are disjoint events in R„ for each n whose probabilities sum to 1, and 
since finiteness of <^(n) implies termination of S with probability 1, we have 


(15.2.15) 


1=1 P(G?) 

<=1 

P(G,) = i P(.G*), 1 = 1,2 . 


Substituting these in (15.2.14), we find 

(15.2.16) ^(x^ + x^+--- + xj = A*)[| inC?)]. 


which is (15.2.12), thus concluding the argument for 15.2.1. 


(c) Criteria for Choosing a Sequential Test 

It will be noted that in the formal definition of a sequential test no 
restrictions have been placed'on the sets in our sequential tests, 

except that for each n they be disjoint sets satisfying (15.2.1). One of the 
basic problems in the construction of a sequential test S is that of choosing 
G°, (7^, and One obviously desirable requirement, and one which will 
always be imposed, is that the sequential process for S terminate with 
probability 1, that is, 

(15.2.17) N(e) = 0. 

Another desirable requirement, as in the case of all statistical tests, is to 
fix the risk of a Type I error, that is, to require 

(15.2.18) W= 1 - a- 

It will also be useful in comparing two sequential tests (as in the case of 
statistical tests based on samples of fixed size) to fix the risk jS of a Type II 
error for some value of 6, say 0 = that is, to require 

(15.2.19) 
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In considering Oq against the alternative we shall let denote the 
hypothesis that 6q is the true value of 6, and the hypothesis that 6^ 
is the true value. 

The imposition of conditions (15.2.17), (15.2.18), and (15.2.19) implies, 
of course, that 

(15.2.20) M((9o) = a, M(e^) =1-^8. 

Taking Oq < the sets of values 6 for which 0 6q, Oq < 6 < Oi, 

6 > di are referred to respectively, as zones of preference for acceptance. 


L($) 



indifference, and rejection of By taking Oq > 6^, we can set up similar 
zones. 

Assuming N(0) = 0, the basic features of the graph of L(d) under 
the conditions stated in (15.2.17), (15.2.18), and (15.2.19) are shown in Fig. 
15.1. 

A sequential test for which L(6) satisfies (15.2.18) and (15.2.19) will be 
called a sequential test of strength (a, p, flj). 

If Si and ^2 are two sequential tests with operating characteristic 
functions Li(d) and L^iO) so that Li(8o) == ^ 2 (^ 0 ) = 1 — a while Li(di) 
< shall say that Si is stronger than S 2 for testing against 

i. If Si is stronger than S 2 for all possible sequential tests Sg, then Si, 
which we may denote by S*, has maximum strength. 
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If the average sample numbers of two tests Si and S 2 of equal strength 
are S’i{n | 6) and 1 ®) if 

(15.2.21) I 0) < Sjin ] 0), 0 = 0o, 0^, 

with the equality not holding for both 6^ and 6i, Si will be considered as 
preferable to ^2 for testing against Jf’i. 

If the relation (15.2.21) holds for both Oq and di and for all possible 
choices of ^2 different from then Si which we may denote by S* is the 
sequential test of maximum efficiency. We shall show in Section 15.4 that 
such a sequential test 5* of maximum efficiency can be constructed by a 
method based on the probability ratio. 

15.3 CARTESIAN SEQUENTIAL TESTS 
(a) Description and Properties of the Test 

Before dealing with the best possible sequential test that can be devised, 
it may be useful in fixing ideas to discuss a simple, even if not very efficient, 
sequential test based on the binomial waiting-time distribution (6.5.15). 
Suppose (xj, Xg,...) is a sequence of independent random variables all 
having the same c.d.f, F{x \ 0). 

Consider the following sequential test S for testing □ against i 
Let Gq, Gq, Gq be disjoint initial sets whose union is the entire real axis, 
and let G('„), G(„) be the sets G^, Gq, Gq in R^i\ n = 1,2,..., where 
is the sample space of x,,, /i = 1,2,.... Then it will be seen that the 
disjoint events Gj,. . ., G°, G^,..., G^, G„ defined in (15.2.4) can be 
constructed as follows: 

Gi = WeC;‘i,}, i = °-' 

Gg = {xj, Xg) G G(i) X GJg)}* I — 


(15.3.1) 


Gi^ — {(^1» • • • » ^n) ^ ^(1) X • • * X G(n-l) ^ ^ 

G„ = {(xj,..., x^) 6 G(i) X • • • X G(„ J. 

Thus, the sequential process for S is simply to continue drawing from 
(xjL, Xg, ...) one X at a time as long as an x falls in Gq on the real axis. 
As soon as an x falls in Gq we accept J^q, or alternatively as soon as an 
X falls in Gq we accept Jifi. It will be convenient to refer to such a test as 
a Cartesian sequential test. 
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Now let 

aid) = f dFix, d) 

(15.3.2) 

bid) = f dFix, d) 

^Go 


dd) = 1 dFix; 0) = 1 - aid) - bid). 

Jgo 

We shall denote a{0), bid), and c(0) by a, b, and c, and let a(6,) = 
Z?(0.) = bi, and c(0,) = / = 0, 1. Then 

(15.3.3) P{G° 1 

0) = ac'-i, P(G'i 1 6) = bc*-\ P(G„ \d) = c” 


L.<9) - 

(15.3.4) 

M,w 

1 — C 


N„(6) = c". 

If Go is chosen 

so that c(d) < 1 , we have lim NJd) = 0, and 

n->oo 

(15.3.5) 

m = 

a + b 

A/(0) = ^ . 

a + b 

The probability that the sequential process terminates upon drawing 
is given by 

(15.3.6) 

pin \d) = ia + b)c^~^ 

and the average sample number is 

(15.3.7) S{n 1 0) = (a + h) f ^ . 

71=1 a 4- h 


It will be seen that pin | 0) is merely a special case of the binomial waiting¬ 
time distribution (6.5.15) for k = 1. 

If the ri^ks of Type I and Type II errors are to be a and at 6 = Oq 
and 0 SHE respectively, then Gq and GJ, must satisfy the equations 


^0 

^0 + ^0 


= 1 — a, 


—^— » 8 
Oi + bi 


(15.3.8) 
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that is, we must have 

( 15 . 3 . 9 ) ^ , -1 = -J— . 

bo a hi I — ^ 

If, for specified values of a and /3, and S’(n | 0o) (= 1 /( 0 # + bo)), a 
collection of choices exists for Gq and G'o the choice that yields the strongest 
Cartesian sequential test in case F(x; 6) has a p.d.f. (or p.f.) f(x; 6) is 
suggested by the Neyman-Pearson theorem 13.2.1. That is, we take Gq 
as the set of values of x for which the probability ratio satisfies 


(15.3.10) 

and G'o as the set for which 

(15.3.11) 


f(x-, Oi) 

/(*; Oo) 


< k 


o> 


> ki, 

fix- do) 


where fcj > > 0 are chosen so that (15.3.8) is satisfied for G% and Gq 

for specified values of a, ft and | In other words, any other choice 
of sets Gq and Gq, say Gq* and Gq*, which leaves ^ 0/(^0 + ^ 0 ) fixed at a 
and «f(/71 (?o) fixed at Wq, say (and hence the probability of a Type 1 error 
and average sample number at Oq fixed), gives a Cartesian sequential test 
S for which ajiai + bi) > ft, that is, for which the probability of a Type 
II error exceeds that for the sequential test based on Gq and Gq as provided 
by (15.3.10) and (15.3.11). 

This statement holds under essentially the same conditions as those 
stated in 13.2.1. The proof is similar to that of 13.2.1, and is left as an 
exercise for the reader (Problem 15.6). 


Remark. By drawing r x's at a time from the population, one can devise an 
r-fold Cartesian sequential test S. It will be sufficient to consider only the case 
where the acceptance and rejection sets in the sample space /?,. of (.r^, . . ., x^) 
are determined by the probability ratio. Thus, let Gq be the set of points 
(a^i, . . . ,x^) satisfying 


(15.3.12) 


M Oo) 


< kg 


and Gq the set satisfying 


(15.3.13) 


fj /(^%; Oi) 
M /(a.\ ; Oo) 




while Go is the set satisfying neither inequality, where > ko > 0 are chosen 
so as to satisfy (15.3.8), a{0), b{0)^ and c{0) being defined as P{Gl \ 0), P{Go | 0), 
and P(Go | 0). It can be shown by argument similar to that for 13.2.1 that this 
choice of Gq and GJ yields a stronger Cartesian sequential test than that for any 
other choice of GJ and Gq in the sample space of {xj ^,..., a^r) which leaves 
^ 0/(^0 + ^ 0 ) fixed at a and ^{n | 60 ) fixed. 
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It will be evident to the reader that if there is difficulty in fixing Type I and 
Type II errors at a and p for a one-fold Cartesian sequential test, one can fix 
these errors as close to a and as one pleases by using an /--fold Cartesian 
sequential test for a suitably chosen r. 


(b) Application to Nonpanunetric Testing 


As mentioned earlier, the Cartesian sequential test may not be the 
best sequential test that can be devised. However, if nothing is known 
about F(x; 6) except that it is continuous, then a Cartesian sequential 
test provides a nonparametric sequential test. To see this let F(z; 6^), 
F(x; fli) be continuous, and let them be denoted by Ff,(x) and Fi(x) 
respectively. Let us choose Gq and Gq as the intervals (—oo, a) and 
(b, -b cx)), a <b, then (15.3.8) can be written as 


(15.3.14) 


Foia) 


F^{a) + 1 - Fo(b) 


= 1 — a. 


Fi(a) 


Fi(a) -i- 1 - F,ib) 




and the Cartesian sequential test based on this choice of Gq and Gq 
becomes a test for accepting a continuous c.d.f; Fq(x) with values Fo(a) 
and 1 — aFo(a)/(l — a) at a and b respectively against the alternative of 
accepting a continuous c.d.f. with values Fi(a) and 1 — (1 — P)Fi(a)lp 
at a and b respectively. The average sample size is [Fo(a) + 1 — Fo(h)]~^ 
if Fq(x) is the distribution actually sampled, and [Fi(a) + 1 — Fi(A)]~^ if 
Fjj[x) is the distribution actually sampled. 


15.4 THE PROBABILITY RATIO SEQUENTIAL TEST 


(a) Definition of the Test 

The discussion in the preceding section of the choices of Gq, Gq, and Gq 
which lead to strongest Cartesian sequential tests for testing .Sfo against 
suggests what, indeed, turns out to be the best possible choice of the 
sets Gj,..., G°, GJ,..., G^, G„, j = 1,2,..., for a sequential test. 
Namely, to choose G°, G^, and 0'„, as disjoint events in F„ satisfying 
(15.2.1) for n = 1,2,..., defined for each n by the respective inequalities: 


(15.4.1) 

where 


Qln 

Qon 


< /Cq, 



ko < 


Qln 

Qon 


< ki 


(15.4.2) 


Qin — ®<)» 


«=1 


i = 0,1 


and where/(*; ©o) and/(a:; O^) are p.f.’s or p.d.f.’s absolutely continuous 
with respect to each otW. It is sufficient to consider only the case of 
p.d.f.’s throughout this section. The results obtained for p.d.f.’s hold 
with trivial changes for p.f.’s. 
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This test was originally proposed by Wald (1945), and is called the 
probability ratio sequential test. The test therefore consists of accepting 
c^o, accepting or drawing after drawing according to whether 
(^i, satisfies the first, second, or third inequality in (15.4.1), 

w = 1 , 2 ,..., where Icq and are fixed so that 

(15.4.3) L(0o) = 1 - a, Ud^) = ^ 

that is, so that the probabilities of Type 1 and Type II errors are a and p. 

(b) Properties of the Distribution Function of Number of Trials Required 
to Terminate the Probability Ratio Sequential Process 

First let us consider the question of whether the probability ratio 
sequential process terminates with probability 1. Let 

(15.4.4) 2 = log 

v(^; Oq)/ 

and let H{z, 6) be the c.d.f. of z for a fixed value of 6 in the parameter 
space £2. When Oq z is assumed to be a nondegenerate random 
variable. Let 

(15.4.5) = t=l,2 . 

v(*<: 

Then (zj, Zg,. ..) is a stochastic process generated by simple random 
sampling from a c.d.f. i/(z; 0) and the sequential process terminates for 
the smallest integer n for which the inequality 

(15.4.6) log /To < 2^1 4- ^n< log 

fails to hold. Let us keep in mind that is the event for which the 
inequality (15.4.6) holds, and hence is the event resulting in the drawing of 
more than n x"s before the sequential process terminates. 

Let = Zi + • • ' + ?2 = ^r+l + • • • + hry • • • » C* = + 

• • • + Zjy,_ If the sequential process never terminates then we must 

have 

(15.4.7) S? <D\ i = 1, 2,... , 

where D = llog + llog/^il. Since fg* • • • is a sequence of inde¬ 
pendent random variables having the same c.d.f., say j(Cl 6)> the prob¬ 
ability that the inequalities (15.4.7) for / = 1,2,... ,m hold is where 

p = P($* < D*) -- PiO. < D^). 


( 15 . 4 . 8 ) 
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Since z is nondegenerate, it has a positive variance o'*. The variance of f 
is ror* and can be made to exceed by choosing r > in which case 
we have ^(5*) > i)*, and hence 

Therefore, taking r > D^/a^ and letting w —► o), we find the probability 
of failure of termination of the sequential process to be lim = 0. 
Summarizing, 

15.4.1 If the random variable z defined by (15.4.4) has a finite mean, and 
finite positive variance^ the probability ratio sequential process 
terminates with probability 1. 

As a matter of fact, a stronger theorem due to Stein (1946) holds for 
the distribution function of the number of trials required to terminate the 
sequential process. A modified form of Stein’s theorem is as follows: 

15.4.2 If the random variable z is nondegenerate all moments of n exist. 

To establish 15.4.2 it is sufficient to consider the moment-generating 
function y)(u) of n 

(15.4.9) yKu) = = f e'“P(Gt | B) 

< = l 

where G* = U Gl and is the event resulting in the termination of the 
sequential process at the tth drawing. By breaking up the series on the 
right of (15.4.9) into blocks of r terms each, it will be seen that for m > 0 

y){u) < ^’■“P(0 < n < r) + < n < 2r) + • • • 

< e''^P{n > 0) -I- e^^'^Pin > r)-\- • • •. 

But we know from (15.4.8) that if z has positive variance 

P(n > mr) < p^. 

Hence 

(15.4.10) y){u) < e^^ + 6*^^ + + • • • = e"“(l - 

If M < 0, we similarly find 

(15.4.10a) y}{u) < e“(l — e^'^pY^. 

Therefore, if u is any (real) number which satisfies e^^p < 1, ^(m) converges. 
The u interval for which this inequality holds contains m == 0 as an 
interior point and it follows that the kth derivative of ^^(m) exists at m = 0, 
which yields the fcth moment of «, for A: = 1,2,- 
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Finally, the following result (which can be restated as a corollary to 
15.2.1) is important in obtaining an expression for ^(n \ 6), and in 
examining the efficiency of a probability ratio sequential test: 

15 . 4.3 If z is a nondegenerate random variable, then 

(15.4.11) ^( 2 i + 22 + • • • + z„ I 6) = \ 6) ■ ^(n \ 6). 

The finiteness of S‘(n | 6) follows at once from 15 . 4.2 and the remaining 
argument follows from 15 . 2 . 1 . 

(c) Determination of the Boundary Constants /cq and for the Probability 
Ratio Sequential Tests 

Now consider the problem of determining the boundary constants 
and for a probability ratio sequential test of strength (a, 0^, p, 0^), 
The exact determination of kQ and k^ is a difficult problem, although close 
inequalities and good approximations are fairly easy to find. 

It follows from the definition of the probability ratio sequential test, 
that will be accepted upon the drawing of if the first inequality in 
(15.4.1) is satisfied. Since we have denoted this event by G°, we have at 
all points in G° 

(15.4.12) ^ ^ 1,2,.... 

Referring to (15.2.8), we have 

(15.4.13) L(0) = P(Gl 1 0) + P(G2 I 0) + • • •. 

If L(0) is to be of strength (a, 0o; P, 0i), then we must have 

(15.4.14) L(0o) = 1 - a, UO^ = p. 

It follows from (15.4.12) that 

(15.4.15) P{G° I Oi) < koP{Gl I 0o), « = 1, 2. 

Therefore 

|f(g: 10,)< feoinc°leo) 

n = l n*=l 

that is, 

(15.4.16) L(d^) < koUdo). 

Substituting from (15.4.14) we find that Atq must satisfy the inequality 
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In a similar manner by considering the event G^, resulting in the accept¬ 
ance of Jffi upon drawing x„,n= 1, 2,.... in which the second inequality 
of (15.4.1) holds, we find that 

(15.4.17) M(0i) > kiMid^). 

Since N{B) = 0, we have M{0) = 1 — L(6). Making the substitution in 

(15.4.17) and using (15.4.14), we find that kx must satisfy 


a 

Therefore, 

15.4.4 The constants k^^ and k^for the probability ratio sequential test of 
strength (a, fl,; p, Aj) satisfy the inequalities 

(15.4.18) 

1 — a a 


Let us now examine the actual strength of a probability ratio sequential 
test if, for a given a and we choose 

(15.4.19) k„ = -A-, fc, = 

1 — a a 


It is convenient to think of (a, 6q; jS, 0i) as the intended strength of the test, 
and (a', \ Qy) as the actual strength of the test. Then it follows from 

(15.4.18) that 


(15.4.20) 


8 p' ^ 1-/3 1 -/ 3 ' 

—^— and - - <- — . 

1 — a 1 — a' a a' 


It follows from these two inequalities that 


(15.4.21) 
and also that 



and a' < —^ 
1-/3 


(15.4.22) a' + /S' < a + /3. 

The last inequality simply states that the sum of the probabilities of actual 
Type I and Type II errors for the choice of kg and kj given by (15.4.19) 
cannot exceed the sum of the probabilities of the intended Type I and 
Type II errors. Therefore, we have 

15.4.5 If the constants k^ and kifor a probability ratio sequential test of 
intended strength (a, Oq; jS, flj) are actually chosen as j8/(l — a) and 
a/(l — /S), respectively, the actual strength of the resulting test is 
(a', 0o; ®i) ^here a' and /S' satisfy inequalities (15.4.21) and 

(15.f2?), ' 
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In practical applications the intended strength (a, dg; /?, 0^) has values of 
« and ^ rarely exceeding 0.10, and the inequalities (15.4.21) and (15.4.22) 
indicate that the difference between the actual strength and intended 
strength from a practical point of view is not very important. 

(d) The Operating Characteristic Function of the Probability Ratio 
Sequential Test 

Thus far we have considered the operating characteristic function L(0) 
for only two values of 6, namely, flg d^. The exact determination of 
L(0) for an arbitrary 6 in the parameter space Q is a difficult problem. 
However, we can determine an approximate expression for L(fl) without 
undue difficulty. 

It will be convenient at the outset to establish the following lemma due 
to Wald (1945); 

15.4.6 Let z = log f(z; d^) — logf(x; 6g), where x is a random variable 
having p.d.f. f(x, 6), 6 being a point in a parameter space 
containing 6 q and 6i. If 

(i) ^(e**) exists for every real h, 

(ii) S'lz) ^ 0, 

(iii) for some > 0 and 0 < 02 < 1* 7’(e* > 1 + 0i) > 0, and 
P(c* < 1 - 02) > 0, 

then, for each 6 there is an h, say h(B), # 0 such that 

(15.4.23) J” e)dx^i 

j-cc '/(*; 00)' 

that is, 

= 1. 

To establish 15.4.6, let Ei be the set of values of x for which e* > 1 + 0i 
and £2 be the set for which e* < 1 — 02- Then we have for h> 0, 

(15.4.24) ^ie”‘) > I c*y(a;; 0) dx > (1 + 0i)'‘P(£i) 

Jei 

and for A < 0 

(15.4.25) Ae**) > f e‘'>f(x; 0) da: > (1 - S^PiE^. 

JEt 

Since P(£i) and P(E^ are ^0, it is evident from (15.4.24) and (15.4.25) 
that 


lim ^(e**) = lim <?(c**) = + 00 . 

h-* + ao ft-* “00 


(15.4.26) 
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Denoting by %p{h) and the first and second, derivatives of %p{h) by 
\p\h) and we have 

(15.4.27) y)X0)^^(z)^0 
and 

(15.4.28) y)\h) = ^(zV) > 0. 

Since tpiO) = 1, y)\0) ^ 0, and y)''(h) > 0 it is evident that \p{h), which 
depends on 6 as well as h, has a minimum less than 1 and furthermore that 
yjQi) —1=0 has two roots, A = 0, and h = A(0), where h(d) ^ 0, thus 
concluding the argument for 15.4.6. 

Now let us consider the problem of approximating L(6). The event 
for which is accepted upon drawing is determined by the first 
inequality in (15.4.1). The same event is determined by the inequality 

(15.4.29) 


where 

(15.4.30) 


" 'eon' " 


and C„ = n/(*«:®) 




and h is any real number. Now can be written as 

(15.4.31) = n 0)]. 

t=i L\f(x^; Oq)/ J 

Under the conditions of 15.4.6, there is for each 0, an A, say A(0), ^ 0 
such that 

(15.4.32) J 6) dx = 1. 

J Oo)/ 

Therefore the function f*(x; 6) defined by 


(15.4.33) /*(x; 6) == /(*; 0) 

has the properties of a p.d.f. 

Let P*(G° I B) be the probability of the event G° evaluated from/*(ic; 0) 
in exactly the same way that P(G° | 0) is the probability of evaluated 
from f{x\ 0). Defining L*(0) similar to L(0) in (15.2.8), let 


L*(0) = ^P*{Gl I 0) == P*(G° I 0). 

n-l 


Then by argument similar to that used in deriving (15.4.16) we find 

(15,4.34) L^(0) < itg<®>L(0). 



Sec. 15.4 SEQUENTIAL STATISTICAL ANALYSIS 489 

The event which results in accepting and which is defined by the 
second inequality of (15.4.1) is equivalently defined by the inequality 

(15.4.35) 

Following a line of argument similar to that by which (15.4.34) was 
obtained we find, corresponding to (15.4.17), 

(15.4.36) 1 - L*(e) > ^f«>(l - L(0)). 

Reasoning similar to that underlying the replacement of the inequalities 
in (15.4.18) by the equalities (15.4.19) for approximating values of Icq and 
ki suggests replacement of the inequalities (15.4.34) and (15.4.36) by 
equalities in order to approximate L(0). The resulting approximation is 
found to be 

(15.4.37) L(0) ~ — . 

Since /z(0o) = 1 and = —1, it will be noted that if the values of ko 
and ki arc approximated by (15.4.19) the approximation (15.4.37) gives 
the correct values for L(0o) and L(0i) for the probability ratio sequential 
test of strength (a, dg; /?, Oj), namely, those in (15.4.14). 

The approximation (15.4,37) for L(d) is satisfactory for practical 
purposes. To examine precisely how accurate the approximation is for 
values of 0 different from 0^f and 6^ is a rather tedious piece of analysis 
which is omitted. But the details of such an analysis will be found in 
Wald’s book (1947«). 

We may summarize as follows: 

15.4.7 Under the conditions of 15.4.6, an approximation for the operating 
characteristic function L(0) for a probability ratio sequential test 
of strength (a, 0,) is given by (15.4.37), where h{0) satisfies 

(15.4.32), and where A'q and ky^ are approximated by (15.4.19). 

(e) The Average Sample Number of the Probability Ratio Sequential Test 

To obtain an approximate expression for S(n | 6), we may proceed as 
follows: If S'{z I 0)0, we find from (15.4.11) that 

(15.4.38) \ 0) = 1'T. ‘ ' 

I 0) 

As before let = G\ U U * * • and G' = G\ U G^ VJ * * •. Then 
P(G° I 0) = L(0), and P(G' | 0) = 1 — L(0) and we can write 

(15.4.39) S{Zy^ + 22 + .. . + 2 J 0) = <f(2j + 22 H-+ I (7°; 0)L{0) 

+ ^(^1 + ^2 + ’ * * + I ^ L { 0 )). 
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But ^(*1 + a, H-h *„I (j°; 6 ) < logA to and«?(«i + Zj +- 1-z„\G';6) 

> log ki. Replacing Atq and Ar^ in these inequalities by the values indicated 
in (15.4.19) and replacing the inequalities by equalities we find that (15.4.38) 
yields the following approximation for ^(n | d ): 


(15.4.40) 


An I 0) ^ 


m log + (1 - m) log (^) 

A^ 10 ) 


At 6q and di the approximate values of An \ 6 ) are 


(15.4.41) 


^(n| 0 o) = 


An 1 6i) ^ 


(1 - a) log + «log 

A^ 1 0 o) 

I 0,) 


These approximations are satisfactory for practical purposes. The 
problem of placing bounds on the error of the approximation in (15.4.40) 
requires considerable analysis which we omit. The reader interested in the 
details is referred to Wald (1947a). 


(f) The Efficiency of the Probability Ratio Sequential Test 

The following theorem due to Wald (1945) provides lower bounds for 
^(/j I 6 ) at 00 and 0 ^ for any sequential test: 

15 . 4.8 Let be a sequence of independent random variables 

having the p,d, 0 ), such that | 0 ) 7 ^: 0 where z is defined in 

(15.4.4). Let S be any sequential test of strength (a, 0o; /?, 0i) 
which terminates with probability 1. Then 


(15.4.42) 


S{n 1 do) > 


S'(n I 0^) > 




Az 10o) 




Az\0d 


There will be no ambiguity if we denote by G%, G'^, G„, the basic events 
in the sample space of (* 1 ,..., z„), n = 1,2,..., satisfying (15.2.1) 
which define an arbitrary sequential test S. [In case 5 is a probability ratio 
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sequential test, G°, G„, n = 1, 2,..., are defined by (15.2.1) and 

the three inequalities in (15.4.1).] 

As usual let z„ Zj,. .. be defined as in (15.4.5). It follows from (15.4.11) 
that for S we will have 


(15.4.43) 

But 


^(n I 0 ) = 


(^(Zi + Z2 + • • • + ~n I ^) 

S(z I 0) 


(15.4.44) <r(zi + 22 + • • • + 2„ 1 9) = ,f(zi + 22 + • • • + z„ I G°; 0)P{G° \ 6) 

+ <^(-1 + 22 + • • • + s„ I C'; 0)P(G’ I 6). 
First, let us consider j 6„). Since S is of strength (a, 0^; ft, OJ, we have 

(15.4.45) P(G° I 0„) = 1 - a, F(G' | 0„) = a. 

Making use of the fact that for any random variable y 

(15.4.46) A(y) - logrT(e''), 

we have 


(15.4.47) 

<^(zi + Z 2 + ■ • • + zJ C°; 0o) < log | C°; 0„) 

= log^(^|(G- 0 „)). 

where G°; 0o) is the conditional mean value, over G (= U 

G: U • • •) when 0 = 0 „, of the product ( 7 ^ ^ -- " ) ’ *al«en 

\/(.r|; 0,,)'Oo)/ 

until the sequential procedure terminates. But 

(15.4.48) G°; 0„) = P(G“ | 0,)/P(G° | 0„) = ^ . 

Therefore 

(15.4.49) ^( 2 j + 22 + • • • + z„ I C°: 0„) log ^ 

Similarly, 

(15.4.50) (5^(21 + ^2 + . • . + .J G'; Oo) log 


1 - a 
1-/1 


Substituting from (15.4.45), (15.4.49), and (15.4.50) into (15.4.44), we obtain 
the first inequality in (15.4.42). The second inequality is obtained in an 
entirely similar manner, thus concluding the argument for 15.4.8. 

Further studies of lower bounds for S {}i | 0) have been made by 
Hoeffding (1960) and Kiefer and Weiss (1957). Anderson (1960) and 
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Donnelly (1957) have made some modifications of the probability 
test in the case of sampling from a normal distribution with known 
variance but unknown mean, so as to reduce S{n 10 ). 

In the case of a probability ratio sequential test, that is, where G°, G', G„ 
are determined by (15.2.1) and by the inequalities in (15.4.1), we have seen 
that 

(15 4 51) + 2:2 + • - • + 2^ 1 < log ^0 

^(^1 + Z 2 + ••• + z^\ G' I d) > log 

If, when a sample point (x^,,., falls in G°, (that is, when z^ + • • • 
+ 2 n < log ko) upon drawing we arbitrarily assign 2 i + • • • + 2 „ the 
value log kQ, and if when (x ^,..., a;„) falls in G^ (that is, when 2 ^ + • • • 
+ 2 ,, > log fcj) upon drawing we arbitrarily assign 2 ^ + • • • + the 
value log ki, then the inequalities in (15.4.49) and (15.4.50) become 
equalities and the expressions (15.4.41) become equalities for S'{n \ Oq) and 
SXn I Oj). Thus, it is seen that the equalities (15.4.42) are actually realized 
in the case of a probability ratio sequential test modified by the approxima¬ 
tion under which Zi + • • • + z^,n = 1 , 2 ,.. ., is assigned the value log fco 
or log ki in accordance with the rule mentioned above. Under these 
conditions, therefore, no sequential test exists which would be more 
efficient for testing against than the modified probability ratio 
test. This heuristic argument concerning the optimum character of the 
probability ratio sequential test was originally put forth by Wald (1945). 
A more complete and rigorous argument was given later by Wald and 
Wolfowitz (1948). 

(g) Truncation of Probability Ratio Sequential Test 

If the probability ratio sequential test does not terminate for « = 1,. .. , 
iV— 1 , suppose the following rule is adopted for terminating the test 
upon drawing Xy : 

Accept if 

(15.4.52) log /cq < 2:1 + • • • + 2 ^ < 0; 

accept Jif’i if 

(15.4.53) 0 < z^+ •••+ Zy < log ki. 

Let this truncated sequential test be denoted by and let and /Sjy 
be Type I and Type II errors associated with Ss- We shall consider the 
problem of determining upper bounds for a^y and Let G^y^ and G[y) 
be the events in Rjy in which and are accepted respectively by 
the test Sy. Then Gly^ and G[y) are disjoint events whose union is Ry. 
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It will be more convenient to consider G°ffy and G^yy as cylinder sets in 
They are disjoint, of course, and their union is /?„. 

We have 

(15.4.54) I do) = a^. 

As before, we let G° and G' denote the events in in which and 
are accepted respectively, in the (nontruncated) probability ratio 
sequential test S, We recall that 

(15.4.55) P(G' I do) = a. 

Let G'* be the event in in which is accepted in the truncated 
case and rejected (Jtg is accepted) in the nontruncated case. 

Then 

(15.4.56) G[^v) ^ ^ G'*)- 

Now if J denotes the event for in which (15.4.53) holds we have 

G'* cz y, 

and hence 

(15.4.57) c (C' U J). 

Now if we choose large enough to make 2 i + * * * + 2 ^ approximately 
normal, we have 

(15.4.58) P(0 < 2i H-h < log ki I ®o) = ^(Vo) - ^(Vo) + 0 

where 

t/o = -4ns{z I e„)/<T (21 Oo) 

^ log fei - S{z I 0 o) /( 7 (z I e,) 

.N J/ 

and <I)( 2 /) is the c.d.f. of A^(0, 1), whereas S{z | d^) and (F{z | Oq) are the 

mean and variance of the random variable z = log 
from the p.d.f. / (x ; Oq). ^ 

Therefore, except for terms of order , a + d)( 2 /«) — O(yo) is an 

Vn 

upper bound for ay. In a similar manner, we find that except for terms 
of order — 7 =, P + ^( 2 / 1 ) — ^(^i) is an upper bound for /Sy where 

Vn 

Vi = log kg - S'iz I ei)J I aiz I Oi) 


/ /(^; 9i) \ 
l/(*; 0o)/ 


determined 


(15.4.59) 


yo = y/N 


(15.4.59a) 
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whereas S{z | and (t\z | fij) are the mean and variance of z determined 
from f{x; O^). 

For small values of a and p, the unknowns ko (in yi) and ki (in can 
be closely approximated by /3/(l — a) and (1 — /3)/a respectively, as 
indicated in Section 15.4(c). 

Summarizing, we have the following result due to Wald (1945): 


15.4.9 


If the random variable 


z 



/(^; 0i) 

/(^; 0o) 


) 


has finite mean and variance whether computed from fix; 6 q) or 
fi^\ Oi), an upper bound for the Type I error of the truncated 
probability ratio sequential test is 

a + 0(2/i) - (I)(yo) + o(^) 

where a is the Type / error by the nontruncated probability ratio 
test 5, and where and y[^ are given by (15.4.59). Similarly, an 
upper bound for the Type II error is 

P - <i>(2/i) + 

where P is the Type II error of S, whereas y^ and y[ are given by 
(15.4.59a). 


15.5 APPLICATION OF PROBABILITY RATIO SEQUENTIAL 
TEST TO BINOMIAL DISTRIBUTION 

To illustrate the results of Section 15.4 we consider the case of sampling 
from the binomial distribution Bi {\, 0). 

In this case .. .) is a sequence of independent random variables 

whose p.f. is 

(15.5.1) fix; 0) = ^*^(1 - X = 0, 1. 

w 

Denoting x^ by Wj, we have 

t 

\oJ\\-oJ 


( 15 . 5 . 2 ) 
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Making use of the approximation (15.4.19) for and let and be 
defined as follows: 


(15.5.3) 


log ( 

1 -x) 

1 - n log ^ 

1 -0A 

l-Oo/ 


log 

" 01(1 -flo)' 
Lflod - d,). 


log ( 

1 -^' 
a > 

) - n log 1 

1 - 01\ 
1 - V 

log 

1 1 

1 _1 



The three inequalities in (15.4.1) then reduce, respectively, to 


(15.5.4) «i < a^<n^< b^. 


Thus, the sequential process continues as long as < b^l it 

terminates upon drawing with acceptance of i^ it 

terminates upon drawing x^ with acceptance of (rejection of c^q) 
if rii > 

Making use of the approximations (15.4.19) for Atq and ki in (15.4.37) 
we obtain the following approximation for the operating characteristic 
function of our sequential test: 


(15.5.5) 



where h(0) is a function of 6 defined by applying (15.4.32), that is. 


(15.5.6) 


»=oU*(l 


— ft 

—0“'(i - ey 


= 1 . 


Simplifying (15.5.6) we obtain the following functional relationship 
between 6 and h 


(15.5.7) 



It will be noted that /i(0o) = 1, and h(dj) = — 1. Furthermore, it will be 
seen from (15.5.5) that the approximation formula for L(0) gives correct 
values for and 1X6^). 
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Referring to (15.4.40) and computing the value of S{z | 0), we find that 
the average sample number E{n \ 6) is given approximately by 


(15.5.8) S(n\e) 


L(0) log + (1 - L(0)) log 

0 log ^ + (1 - 0) log (|-^^) 

Uq M C/q/ 


Thus, if the (approximate) expression for L(d) in (15.5.5) and the expression 
for 6 given by (15.5.7) are substituted into (15.5.8) we obtain an expression 
for <f(« I 0) as a function of the parameter h. S(n | 0) can be quite readily 
determined for 0 = 0, 0i, Og. 

Applications of the Wald probability ratio sequential test for sampling 
from various specific distributions have been published by Wald (1947(7) 
and by the Statistical Research Group, Columbia University (1945, 1947). 


15.6 SEQUENTIAL ESTIMATION 
(a) General Comments 

A general theory of sequential estimation of parameters has not been 
developed. Wald (1947a) gave one formulation of the general problem of 
sequential estimation by intervals but did not solve it. He and Stein (1947) 
however, did consider a sequential procedure for determining a confidence 
interval of fixed length and confidence coefficient for the mean of a 
normal distribution with known variance. Other specific estimation 
problems have also been considered, having aims akin to those of sequential 
estimation. An account of a number of results on these problems has 
been given by Anscombe (1953). 

The basic idea of the sequential estimation of a parameter is in general 
to do just enough sampling to be able to obtain an estimate which has a 
predetermined degree of precision, in some sense which does not depend 
on the unknown population parameter being estimated. Expression of 
degree of precision can be set forth in various ways. One simple way to 
express such precision would be to provide an estimator for the parameter 
whose variance does not depend on the parameter. This can sometimes 
be done by using samples whose size is fixed in advance. In the case of 
large samples an approximate solution of this problem under certain 
regularity conditions was presented in Section 12.3(^). 

Another way to express such precision is for the parameter to have a 
confidence interval of length specified in advance of the sampling. This 
can be achieved in special cases by a fixed-size sample. For instance, 
suppose /i is unknown but is known in iV(//, cr*), and that x is the mean 
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of a sample of size n from this distribution. Then ^ ± is a confidence 
interval of length d having coefficient > y estimating //, provided n — 1 
is chosen as the largest integer in (unless this quantity is an integer, 

in which case n is chosen as this integer) where y., satisfies ^(y.) — 
^(—2/).) = ^(y) being the c.d.f. of A^(0, 1). 

(b) Stein^s Fixed Interval Estimator for Mean of a Normal Distribution 

In the case where both and // are unknown, Stein (1945) showed 
how a double-sampling procedure could be used for establishing a con¬ 
fidence interval of fixed length S for estimating /jl and having confidence 
coefficient > y. His result can be stated as follows: 

15.6.1 Let (.r^, . . . , he a first sample and . . . . , a second 

independent sample from N(p, Let he the sample variance 
of the first sample and x the mean of hath samples combined. Let 
k = [ 2 st^^_^ yvhere t,^ is the upper 100(1 — iy)% point of 
the Student distribution S(n — 1). Let m he chosen as 0 ifk--n<, 
0 and as the smallest positive integer k — n if k — n > 0. Then 
X ±: lb is a confidence interval of length d for p having confidence 
coefficient y. 

To establish 15.6,1, it can be readily seen that, even though x is the 
mean of a sample of size m from N(p, a^), where m is a random variable, 
{x — p)l[alVm + n] has the distribution N{0, 1), and is independent of 
(n — \)s^la^, which has the chi-square distribution C(n — 1). Thus, 
(x — p)V m + fi/s has the Student distribution S{n — 1), and hence we 
have 

(15.6.1) < +t„.. = y 

or equivalently 

(15.6.2) p(x - - ^7’^' < // < 2 + = y. 

\ yjm + n fm -f n J 

If st^^_'^ jVn < id, then x ± b \s a, confidence interval of length b for 
p provided by the mean of the first sample only, and having confidence 
coefficient > y. If, however, st^^_i ,jVn > we draw a second sample 
of size m such that m is the smallest positive integer for which 

========= ^ '2 

V'” + « 
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that is, m is the smallest positive integer such that 


m > 



— n. 


In this case « ± is also a confidence interval of length < ^ for given 
by the mean of both samples combined and having confidence coefficient 
> y, thus establishing 15.6.1. 


PROBLEMS 


15.1 Show that 15.2.1 holds if (a?i, aig,... ) is a sequence of independent 
random variables defined on a finite interval {a, b) and having equal means 
^(x) and if ^(«) is finite. 

15.2 Suppose (a?!, arg,...) is a sequence of independent random variables 
having c.d.f.’s Fi(x; 6), F 2 (x\ 0),.... Let be a set such that 



dFi{x\ d) ^pi{0) </?<!. 


If S is any sequential process having the sequence of experiment continuation 
sets Gj, Gg,... defined as follows 


Gn = X E<2) X • • • X 


show that the sequential process terminates with probability 1. 

15.3 A sequence of independent random variables (.Cj, ...) is assumed to be 

drawn from a population having p.d.f. .r, 0 > 0. It is desired to set up a 

Cartesian sequential process for testing the hypothesis that 0 = 0o against 
the hypothesis that 0 = 0^ where 0i > 0^, and where Type I and Type II 
errors are a and respectively. Show that Gg, Gg, and Gq yielded by the proba¬ 
bility ratio criterion are the intervals (xg, oo), (0, where x^ < 

satisfy the conditions 

1 — a 

1 — a 


1 - 


Show that the average sample number S{n\0) for this sequential process attains 
its largest value for 0 « . 
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15.4 {Continuation) Suppose an r-fold Cartesian sequential process is used 
for testing against Show that Gq, Gq, and Gq (in Euclidean r-space) 
determined by the probability ratio criterion are defined as follows: 


Go : , a;,) : ^ x,- > //gj 

^0 • ((•^'1’ • • •»^V) • 2 ^ 


Go : (^’i, //i < 2 < y2h 

1 


where (t/j, satisfy 


and 


1 - a 

/(/y; dy =- /(/y; Oq) dy 

Jyz “ 

J fin; ^i) dy = /(y; Oi) dy 

Ori,r-ie~ev 


15.5 In the Cartesian sequential test discussed in Section 15.3(a), if the process 
has not terminated upon the nth trial suppose it is truncated (arbitrarily termi¬ 
nated) by choosing ^o or *^1 with probabilities a/(a + h)sindbl{a + ^) respectively. 
Show that in this case 


Ln(0) 


a 

a + b 


Mn{0) 


b 

a b 


Hn I 0) 


1 - 


and that Type I and Type II errors a and p satisfy (15.3.8). 

15.6 Prove that the Cartesian sequential test for which Gq is defined by 
(15.3.10) and Gq by (15.3.11) is stronger than any other Cartesian sequential test 

which leaves ao/(ao + ^o)fixedataand<^(«| ^o) (Ihat is, l/(ao + /?o)) '^o- 

15.7 Suppose {x^, ^ 2 ,. ..) is a sequence of independent random variables all 
having the normal distribution N{0, 1). Let q the hypothesis that 0 = 0© 
and the hypothesis that 0 = 0^ :> Oq, For Type I and Type II errors a and p, 
and taking approximations for kQ and ki as given by (15.4.19), show that in a 
sequential probability ratio test for testing against that is accepted 
on the Aith drawing if, for the first time. 


4- • • • + < 


1 


(01 “ 0 o) 


log 


(r^) 


+ n 


01 +02 


2 
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that is accepted on the nth drawing if, for the first time, 

*1 + •••+*« > 


(91 - 

and an (n + l)th drawing is made if + 
indicated above. 

Also show that 


1 , 1 - ^ + ®i 

log_ + „i 


« 2 
+ Xg lies between the two values 




M- 


where 




+ ©0 "" 




and that this approximation for L(d) passes through the points (0o, 1 — a) and 
(^ 1 . fi), as specified by the Type I and Type II errors. Furthermore, show that the 
average sample number is 


^(n I 0) ^ -2 . 


m log 


(r4.) 


+ (1 - m) log 




h(e, - Oof 

Extend these results to the case of sampling from N(d, a*), where is known. 

15.8 Suppose X 2 ,...) is a sequence of independent random variables 
having the normal distribution N(0, a^). Let hypothesis that 

and ^2 It'® hypothesis that a* = af > org. For Type I and Type II errors a 
and P and using the approximations for Atq and ki given by (15.4.19) show that 
in a sequential probability ratio test for testing against that is accepted 

on the /fth trial if, for the first time. 


+ < 


‘'oM[2iog(y4i;) 


<rf - al 


that ^2 accepted on the /ith trial if, for the first time, 

-oS{21og(i^) +«>og(7!)] 


+ 


fff - 


and the (» + l)th observation is taken if *f + • • • + »J lies between the two 
values given above. 

15.9 {Continuation) Show that the operating characteristic function L{(^) 
is given approximately by 
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where the functional relationship between and h is as follows 

. »>[■-( 

h(a\ - ag) 

and veri^ that the graph of this approximation for passes through the 
points (al, 1 — a) and (af, P). 

Furthermore, show that the average sample number is given by 
lala^Sna^) log + (1 - Ho^) log 

I <^) s-----—- 

log (®o/®i) 

15.10 Determine the probability ratio sequential test for the hypothesis 
that a sample of size n comes from the Poisson distribution Po(^o)» alter¬ 
native being that the sample comes from the Poisson distribution Po(^i) 
where > 0^. Find approximations to the operating characteristic function 
L{B) and the average sample number ^{n | 0) for the test. 

15.11 In the binomial waiting time distribution 




let/i = 


k - 1 

«- r 


n = k,k + . 

Show that p is an unbiased estimator for /?, that 


a2(p) = 



2{k - 1 )/ 

{k - 3)(* - 4) 



and hence that the coefficient of variation of p, that is, <j{p)l^{p) is 


;7=(i+o(;,)). 


[See Haldane (1945)]. 

15.12 Let be the variance of a sample of size n from o^). For an 
arbitrary e > 0 let /w be the smallest integer > (n — l)s^l[(n — 3)e]. If x' is the 
mean of an independent sample of size m from N{p, d) show that a\x') < e. 

15.13 Let si and be sample variances of independent samples of sizes /ij 

and /Z 2 from N(jni, af) and N(p 2 , respectively. For arbitrary e > 0 let nti 
be the smallest integer > [2(/ii — 1 )ji]/[(/Ii — 3)e] and m 2 the smallest integer 
> [2(«2 l)‘yi ]/[(«2 — 3)£]. Let further independent samples of sizes mi and m 2 

from N(fii, af) and o|) respectively, and let Xi and ^2 means of these 
samples respectively. Show that a\xi — x^ < e. 



CHAPTER 16 


Statistical Decision Functions 


16.1 GENERAL REMARKS 

In the theory of testing statistical hypotheses as developed by Neyman 
and Pearson, the risks involved in falsely accepting or falsely rejecting an 
hypothesis are recognized and emphasized. In two pioneering papers 
followed by a book, Wald (1947Z>, 1949a, 1950) has extended the theory 
of such risks to a wider class of statistical problems, developing what he has 
called the theory of statistical decision functions. His development was 
partly motivated by the theory of games as formulated by von Neumann 
and Morgenstern (1944) and partly by a desire to construct a more 
general theory of risk-evaluation involved in statistical procedures. Wald’s 
original work on statistical decision theory has been followed by many 
research papers and several books including a rather comprehensive one 
by Blackwell and Girshick (1954) dealing with statistical decision theory 
for discrete sample spaces. In this chapter we shall present a brief in¬ 
troduction to some of the main ideas and basic results of statistical 
decision theory in the simplest type of situation. We shall not go into 
the theory in its most general form. The reader interested in the general 
theory and further details is referred to the books by Wald and by 
Blackwell and Girshick. Books at a more elementary level have been 
written by Chernoff and Moses (1959), Luce and Raiffa (1957), Raiffa and 
Schlaifer (1961), Savage (1954), and Weiss (1961). 

16.2 DEFINITIONS AND TERMINOLOGY 

Suppose a; is a discrete random variable with sample space R, and 0 
is a point in a parameter space Li which contains only a finite set of points 
flj,..., 6^. Let p(x j 0) be the probability distribution of x for a given 0. 
Thus we have h probability distributions p{x | 0^),... ^p{x | 0;^), one of 
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which is assumed to be the true distribution whenever an observation is 
made on x. 

Suppose a is any one of a set A of decisions (or conclusions) that could 
be made about 0 on the basis of an outcome of an observation on x. For 
instance, if Oj, . . . , 0,^ are numbers, and c is some critical number, A 
might consist of two elements, and 02 , where is the decision that 
0 < c and 02 the decision that 0 : • c. A could, of course, consist of any 
finite number or an infinite number of elements. A is called the decision 
space (although it might be more accurate terminology to call it the 
conclusion space). If the observation on x is to have any relevance as to 
which decision ae A \^ made, then a must depend on x. We must therefore 
have a decision function d{x) to determine what decision a in /I to make 
if the sample point x occurs. Thus, d(x) is a single-valued function of x 
having as its value some a e A for each point x in the sample space R. 
Any decision function d(x) considered will be a member of some class of 
decision functions D. It should be noted that for a given decision function 
d{x) and for a sample space R having only a finite number of points, there 
will be only a finite number of elements in A. 

Now, for each 0 g 12 it is possible to make any decision a e A. If the 
consequence of making decision a when 0 is true is such that L(0, a) is 
the loss or cost, then L{0, a), which we shall take as a bounded single¬ 
valued real function, is defined at every point (6, a) in the product space 

X A and is called the loss function. For any decision function de D 
and parameter point 0 e 12 , the risk function r(d, d) is defined as the mean 
value of the loss function, L(0 , ^/(t)) over the sample space, that is, 

(16.2.1) r{0, d) = Sme, c((a:))] = ^ ^6- d(x))p(x \ 6). 

xeR 

Thus, r(0, d) is defined at every point in the product space il x D and 
is bounded since £(0, d{x)) is bounded. 

Remark. It may be convenient to think of some of the preceding concepts in 
terms of elementary game theory concepts. We may think of nature as player I 
and the statistician as player 11. Nature’s strategies or choices are Oj,. . ., 0^, 
whereas the statistician’s strategies or choices are the d’s in D. Thus, for any 0 
chosen by nature and any i/chosen by the statistician L(0, d{x)) is the statistician’s 
loss (nature’s pay off) if the sample point x occurs. The quantity r(0, d) defined 
by (16.2.1) is the average loss to the statistician who adopts d{pc) as his strategy. 

Now suppose we are given a certain loss function L(0, a), a decision 
function d, and the associated risk function r(0, d). The question now 
arises as to what criteria might be used for preferring one decision function 
to another in D. The risk function itself provides one reasonable criterion. 
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For suppose d and d* are two decision functions in D, Then comparing 
d and d* on the basis of the risk function, d* would be preferable to d if 

(16.2.2) r(6, d*) < rid, d) 
for all 0 in Q and 

(16.2.3) r(0, d*) < r(e, d) 

for at least one 0 in fl. If (16.2.2) and (16.2.3) hold d* is called a uniformly 
better decision function than d. If no other decision function in D is 
uniformly better than d*, then is called an admissible decision function. 
A class D of decision functions is called a complete class if for any d not 
in D we can find a in Z) which is uniformly better than d. 

If a complete class D of decision functions contains no (proper) subclass 
which is complete, then D is called a minimal complete class. The concept 
of a complete class of decision functions is important in general statistical 
decision theory. The relationship between a complete class and a minimal 
complete class of decision functions is given in the following theorems 

16.2.1 and 16.2.2: 

16.2.1 If a minimal complete class exists, it is equal to the class of admissible 
decision functions. 

Let Dq denote the class of admissible decision functions. Then if D is 
a minimal complete class, is a subset of D. Now suppose d' is an 
element of D which does not belong to D^. Then there is a d'" which is 
uniformly better than d\ But d" cannot be an element in D since Z) is a 
minimal complete class. Hence there is an element d’" in D that is uni¬ 
formly better than d" and therefore uniformly better than d\ But this is 
impossible since Z) is a minimal complete class. Hence Dq cannot be a 
proper subset of D and therefore Dq and D are identical. 

If the class Dq of all admissible decision functions is complete then Dq 
is a minimal complete class. It follows from this fact and 16.2.1 that 

16.2.2 A necessary and sufficient condition for the existence of a minimal 
complete class of decision functions is that the class of admissible 
decision functions be complete. 

Wald (1950) has shown under rather general conditions that the set of 
admissible decision functions forms a complete class, and, in particular, 
that the set of all Bayes solutions (to be considered later) is complete. 

16.3 MINIMAX SOLUTION OF THE DECISION PROBLEM 

Let us return to the risk function r(0, d). For any given decision func¬ 
tion dy and loss function L(0, a\ the risk vector associated with the possible 
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choices of 6, namely, 0i,..., is (r(0i, ..., d)). Now consider 

the maximum component of this vector. It is a function of d. Suppose 
d* is a decision function tn a class D which minimizes this maximum 
component. In other words, suppose d* is a member of D such that 

(16.3.1) l.u,b. r(0, d*) < l.u.b. r(0, d) 

0£il 0f.£l 

for all d in D. Then d* or its associated risk vector (K^i, d *),. .., 
r(0„, d*)) is called the minimax solution of our decision problem. 

If D contains only a finite number of elements d it is evident that there 
exists at least one minimax solution d* and the risk associated with such 
a solution would be 

(16.3.2) min [max r{0, d)} 

del) OqLI 

The d (or ^/’s) in D, that is, the d (or J's) which yield this minimax risk, 
could be found by exhaustively examining the values o\' r((), d) at the finite 
set of points on U x Z) in the order indicated by (16.3.2). 

If, however, D contains an infinite number of elements the situation is 
more complicated. In this case a geometric representation will help. For 
each d in D the //-tuple (riO^, d ),. . ., d)) can be represented as a 

point r in Euclidean space /?,,. The set of all such points r corresponding 
to all d in D is a set E in 7?^. E is bounded since r is bounded on Q X Z). 
Thus the problem of finding a minimax solution of the decision problem 
in this case is equivalent to taking the maximum coordinate of each point 
r in E and then minimizing these maximum coordinates. If a solution d* 
exists it can be expressed in the following way. Let E, be the subset of E 

for which /*((/,, d) > max d)}, i = 1,..., //. Then E E^\J E^KJ 

j / i 

• • • U £■,,, although it should be noted that Ej,. . ., Ef^ are not necessarily 
disjoint, since any point in E having two or more equal largest coordinates 
would belong to two or more of the sets £* 1 ,..., Ef^. Now let M, be the 
greatest lower bound of the /th coordinate of all points in £„ / = 1 ,..., h. 
Then if there were a cZ* in Z) (a point in E) such that 

(16.3.3) max r(0„ d*) = min (Mi,. . ., A/^,} 

i 

d* would be a minimax solution. If E is closed, that is, if E contains the 
limit points of all sequences of points in £, then there does exist at least 
one d*. 

We shall show that E is closed under some mild conditions. First, note 
that for any point in Z) x /?, (L(0i, d{x)), ..., d(x))) is a point L 
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in Euclidean SpaceLet/'be the set of all such points L. F is bounded 
since L{0, dix)) is bounded. We shall show that 

16.3.1 If F is closed E is also closed. 

First, consider the case where R has a finite number of points. In this 
case A will have a finite number of points. Since d(x) is single-valued, 
L(6^, d{x)) will have a finite number of values and so will r(0„ d), since 
/•(0„ d) is the average or mean value of d(x)) over the sample space R 
with respect to p{x | 6^), / = 1,..., A. Therefore, if R contains a finite 
number of points, E contains a finite number of points and is therefore 
closed, and a minimax solution to the decision problem therefore exists 
in this case. 

Now consider the case in which R has a countably infinite number of 
points. Let these points be arranged in sequence x^, t = 1,2,.... Then 
for any d m D (and hence its corresponding point r in E) we have 

(16.3.4) r(0^, d) = 2 d(x,))p(xt | 0.). 

Thus d is specified by the sequence of points 

(16.3.5) (L(0,, d(x,)) .L(0„ d{xM / = 1, 2,. . . 

in Euclidean /?,,. All such sequences corresponding to all d in D generate 
the set of points F in R,, which is bounded since L{0, d{x)) is bounded. 
We assume F to be closed. 

Consider next any convergent sequence of points ol = 1, 2,... in £* 
where has the coordinates 

(16.3.6) (r(0i, .. ., r(0,. d,)), 

and let its limit point /•* be denoted by 

(16.3.7) (r(0,.d^) . r(0,,d*)). 

We shall show that also belongs to £, and hence that E is closed. The 
sequence of points in F corresponding to d^ is 

(16.3.8) (L(0i, dfx,)) .L(0,„ dfx,U 1=1,2,... 

For a = 1,2,... (16.3.8) is thus a double sequence of points in F. 
Using the Cantor diagonal procedure for this double sequence, a sub¬ 
sequence d^fi, /? = 1,2 ,... can be chosen from the sequence d^,oi= 1,2 ,... 
such that for each t the sequence of points 

(16.3.9) (Z.(0„ c/,/.r,)),.. ., L(0,„ d,^{x,))) 
in F converges to 

(16.3.10) ( d*(x)\ .... L(0„ d*{x,))) 
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as jS 00 . Since Fis closed, all points (16.3.10) for i = 1,2,... are in F, 
Now consider the risk function 

(16.3.11) rip, d.,) = I d,f(x,))p(x, 10). 

Taking the limit as -> oo and in view of the convergence of the sequence 
(16.3.9) to that in (16.3.10) for every /, we have 

lim riO, d^fi) = lim 2 \ 0) 

P-* cfj p-*cf> l = \ 

(16.3.12) = 2 lim Lid, d,^(ar,))p(j;t | 0) 

= 2 L(0, d*iz,))pix, I 0). 

The interchange of lim and S is valid since the loss function is bounded 

and p(x^ I 0) is a probability distribution. But since the points in (16.3.10) 

00 

are in F, it follows that 2 d*(Xf))p(x^ | 0), which may be denoted by 

r(0, d*), is in F. Hence E is closed. 

Thus, if the set F in F,, consisting of the points (F(0i, d{x )),..., 
F(0^, d{x))) for all de D and a: e F is closed, the set E in F^ consisting 
of the points (r(0i, c/),.. ., r(0;^, d)) for all de D is also closed, and 
there exists a minimax solution d* in D for our statistical decision 
problem. 

Application of the minimax procedure to specific problems usually 
requires a considerable amount of computation. We select a very simple 
example to illustrate the procedure. 

Example. Suppose .r is a discrete random variable with sample space (0, 1) 
which has the distribution 

/’(•'• 10.) = I - 0i = i 0, = j 

and let the decision space A consist of two elements and Let the loss 
function a^) be defined as follows: 


^2 


^2 
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Now the possible decisions for each sample point x are ai and If we permit 
either decision or ^2 at each sample point then there are four possible decision 
functions “ U • • •»4 on the sample space where 

di(0) = fli di(l) = Cl 

^ 2 ( 0 ) ~ ^ 2(0 “ ^2 

ds ( 0 ) *= £12 ^3(1) = 

^4(^) ~ ^2 ^ 4(0 “ ^2 

The four risk vectors (K^i, £ 4 ), r(02, ^«)), « = 1,...» 4 are found by applying 
(16.2.1); that is, 

^a) * i UBi. d^(x))p(x I 6,1 / = 1, 2; a = 1, . . ., 4. 

a; = 0 

Inserting numerical values of d^(x), L(0^, £ 4 ) and pix | 0,.) we find the numerical 
values of the 4 risk vectors to be (1,3), (|, f), (^, f), (4, 2). The maximum 
coordinates of the four vectors are 3, f, 4, respectively, and the minimum 
of these is f. The solution occurs for the risk vector (r(0i, d 2 ), r( 02 , <^ 2 ))* 
hence the minimax solution is the decision function d 2 (^) and the corresponding 
risk vector is (J, |). 


16.4 BAYES SOLUTIONS OF THE STATISTICAL 
DECISION PROBLEM 


(a) Solution against a Specified a priori Distribution 

Now suppose a value of 0 occurs in accordance with some a priori 
probability function q{B\ Then for a given decision 

function d in the available class D, the risk function r(0, d) would be 
averaged over 6 with respect to the a priori distribution q{0) to yield the 
average risk r(q, d) defined as follows: 


Kq,d)=^r{e„d)q{e,) 
(16.4.1) , 

t-i <=i 


If we knew ^(0), and could find a decision function d* in D such that 


(16.4.2) r(q,d*)<r(q,d) 

for any other d in D, then d* would be regarded as an optimum solution 
of the statistical decision problem. A decision function which thus 
ntinimizes r(q,d) is called a £a}>es solution relative to the particular 
a priori distribution ^0). 

Note that since > 0, i ** I,..., h, and q(6^ + ■ • • + ^(0*) = 1, 
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then ( 9 ( 61 ).9(0*)) can he represented as a point on the (A — 1 )- 

dimensional simplex G in Euclidean space which is spanned by the h 

points(1,0 ,... ,0),(0,1,0,.. .,0).(0,... ,0,1). 

Now suppose we have some unknown nondegenerate a priori distribu¬ 
tion. How do we minimize the risk? 

We can represent f(^, d) as 

Kq, d) =11 m, dixMiOi I *.) W 

t=l 

* = 1 

g( 0,1 is the a posteriori probability that 0 = 6 , given that x ^ and 
given the a priori distribution ^( 0 ,). 

Now for a given q{B) and /, let 

(16.4.6) iL(e,,d*(*,))Q(0j*,) 

1 = 1 

be the greatest lower bound of 

(16.4.7) im.d(x,))e(0,|*,) 

* = 1 

for all points (L(0i, d{x ^),..., L(0^, d{x^)) in F for a fixed t. Note that 
for fixed t, (g(0i | x ^),..., g(0^ | xj) is a point in G, Thus the sequence 
of points a?!, ajg,... in the sample space R determines a sequence of 
vectors in Fand a corresponding sequence in G, If Fis extended to include 
the vectors 

(16.4.8) (L(fli, d(x,))(2(fli I X,),..., L(fl», d(x,))e(0» I *,)) 
for each / and for all d in D, and also the minimizing vector 

(16.4.9) (L(ei, d*(x^)e(ei lx,),..., L(e», d*{x^mh 1 *0) 

for each t, then E, and hence D, are closed and the expression in (16.4.6) 
will provide a minimum for the expression in (16.4.7). In other words, D 
will contain at least one d* so that 

(16.4.10) I i d*(x,))G(e, 1 x,)P/x,) < f(«, d) 

t-n-i 


(16.4.3) 
where 

(16.4.4) 
and 
(16.4.$) 
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for any d in D. Each such provides a Bayes solution of the decision 
problem against the a priori distribution q{d). The set of such d* is 
nonempty and closed and hence the class of admissible decision functions 
d^ is complete. 

Summarizing, we have 

16.4.1 If, for a given a priori distribution q{d), F is extended to include 
the vectors (16.4.8) for each t and all de D and (16.4.9) for each t, 
then E is closed and d* corresponding to the sequence of vectors 
(L(6i, d*{xf ),. . ., L(0^, d*{x^)) defined in (16.4.6) minimizes the 
average risk r{q, d) defined in (16.4.1). 

(b) Geometrical Interpretation 

If we denote the left-hand side of (16.4.10) by r{q, d*), then we have 

(16.4.11) r{q, d*) = min r(q, d) 

de D 

= min 2 

reEi^l 

Thus, if the weighted average of the coordinates of each point (risk vector) 
in E is taken, using the a priori probabilities q(Q^, ..., q{0j) as weights, 
then r{q, d’^) is equal to the smallest of these weighted averages. Or stated 
geometrically, if, for a given q(d), one takes the family of hyperplanes 
(C being the parameter) 

(16.4.12) lr,q(d,) = C 

i = l 

in passing through all points in E, the point r = (rj,..., rj yielding 
the smallest C provides the Bayes solution against the a priori dis¬ 
tribution q(0), and the value of C for this point is the minimum average 
risk f{q, d*). All coordinates of r are finite since E is bounded. 

(c) The Set of Solutions against AH Possible a priori Distributions 

Now let us examine the set of solutions corresponding to all possible 
a priori distributions. Note that 

(16.4.13) (#i),....#»)) 

is a point in the simplex G in spanned by the h points (1,0,..., 0). 

(0.0,1). The coordinates of the point (16.4.13) are direction 

numbers of the normal to the hyperplane (16.4.12). Thus, if we find the 
point r in £ yielding the smallest C for each possible a priori distribution 
q{B), then the set of hyperplanes corresponding to this set of smallest C’s 



Sec. 16.5 STATISTICAL DECISION FUNCTIONS 511 

forms a lower bounding hull or envelope for the set E, It is evident that 
the set £* of all points r \n E which are used in determining the bounding 
hyperplanes for the hull lies on the hull itself. The set of risk vectors E* 
is the set of all Bayes solutions of the decision problem generated by all 
possible a priori distributions q{0). 

Example. At the end of Section 16.3 we gave a simple numerical example to 
illustrate the problem of obtaining a minimax solution. Let us return to that 
example and assume that 0 has the a priori distribution q{&) where q{0^) = 
^(^2) = I- Recalling that the risk vectors obtained in the minimax example 
corresponding to decision functions d^, d^ are (1, 3), (-|, f), (\-, \)y (4, 2), 

respectively, we now calculate 2 d)q{d-) the inner products of each of 

i 

these vectors and the vector (q{0i), q{d^) = (|). The four inner products are 
il» i 2 i a>id ?|. They are the average risks associated respectively 
with di, f/a, d^ against qifl). Thus the decision function yielding the minimum 
average risk against the particular qid) chosen is If the bounding hull as 
explained above is constructed from the four risk vectors it will be found that the 
hull itself contains d^, r/g and d^. This means that d^ or d^ will provide a 
Bayes solution against any possible a priori distribution q{d), 

16.5 REMARKS ON EXTENSIONS AND GENERALIZATIONS 

We have discussed statistical decision theory only for the simplest case, 
namely, that in which the random variable x is discrete, where the param¬ 
eter space Q contains a finite number of points, and where the random 
variable x is observed once. (Notice that x could be an w-dimensional 
random variable, however. For instance, x could be a sample of size n 
drawn from a Poisson distribution with parameter 6. The parameter 6, 
however, would be limited to a finite number of values.) We have derived 
general minimax and Bayes solutions in this relatively simple case. 

There are various directions in which the results obtained can be extended 
and generalized. The first might be to extehd the results to the case where 
X is an absolutely continuous random variable, thus possessing a density 
function f{x\0) over the sample space, with the parameter space Q held 
to a finite number of values. The next direction would be to let Q have a 
countably infinite number of values, or even be a set of points in an 
Euclidean space with x being either discrete or continuous. Another 
direction for generalization is to introduce sequential sampling. To 
attempt the general theory for these various extensions here, however, 
would fall outside the scope of this book. Readers interested in these 
various extensions should refer to the books by Wald (1950), and by 
Blackwell and Girshick (1954), and to papers by Dvoretzky, Kiefer and 
Wolfowitz (1953a, 1953^?), by Karlin and Rubin (1956), and by Lehmann 
(1957). 
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PROBLEMS 


16.1 Suppose X is a random variable which has the binomial distribution 
p{x I 0) = ^2^ 0^(1 - 0)2-*, a: = 0, 1, 2 


and that the parameter space of 0 contains two points = roo* ^2 = ‘A*- 
Let the decision space A contain two elements and let the loss function 
^(^ 1 , he defined as follows: 


Ox 


Let the space D of decision functions contain three ppints d^(x)^ and 
where 

cr =0, l,...,a - 1 

. 

[02 Otherwise 

a = 1, 2, 3. Determine the decision function that provides the minimax solution 
for this decision problem. 

16.2 {Continuation) Find the decision functions which provide Bayes 
solutions of the decision problem against all possible a priori distributions of 0 
over the points 0^ = O 2 = iV- 

16.3 Suppose .r is a random variable having p.f. 

^(. r | 0 ) =(1 - 0 ) 0 ^~\ X = \^2 _ 

and the parameter space H contains two points 0^ = ^2 = Let the 

decision space A contain two elements {a^, and let the loss function Z.(0„ Qj) 
be defined as follows: 

O 2 

Let the space D of decision functions contain the following points: 

Oi .r = 1,. .., a 
a2 ir = a + 1, a 4* 2, . . . 

a =c 1, 2. Determine which decision function provides the minimax solution 
for the decision problem. 

16.4 {Continuation) Find the decision functions which provide Bayes 
solutions of the decision problem against all possible a priori distributions of 0, 
assuming, of course, that the space of 0 has only two points 0^ = Og ~ A- 
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16.5 Suppose X is a random variable that has one of the p.f.’s p(^c | Oj) or 
pi.v I O2). Let the decision space A contain two elements (ai, and let the loss 
function Oj) be as follows: 

^2 

Show that of all possible decision functions a decision function d{x) which 
provides a Bayes solution against the a priori distribution is as follows: 
d{pc) = for all values of sc satisfying the likelihood ratio inequality 

pU' I Qq) , * ^(^1) 

/7(.r I Oi) ^ 2 ) 

while d{x) = 02 for all other values of .r. 


Oi 02 


0 

c 

1 

0 




CHAPTER 17 


Time Series 


17.1 INTRODUCTORY REMARKS 

Many problems occur in science and technology in which a process 
produces what we may idealize as a family of random variables (stochastic 
process) such that there is a random variable for each value of / (time) 
on some interval T, thus generating a random function of time. Such 
functions are called time series. Examples of time series are: voltage in a 
circuit over a period of seconds; noise in a factory over a period of 
minutes; height of sea waves over a period of hours. 

Other time series show characteristics of random fluctuations super¬ 
imposed over some smooth trend or time function. Examples of such time 
series are: temperature in a city over a 24-hour period; stock prices over 
a period of months; national employment over a period of years. 

Various statistical and mathematical methods have been developed over 
a period of many years for studying and analyzing time series. Many of 
these methods have been developed to estimate the smooth trend function 
supposedly underlying the series by “averaging out,” in some more or less 
empirical sense, the random fluctuations in the series. 

In recent years, however, considerable attention has been given to the 
study of time series as stochastic processes. Such an approach often 
provides more insight into mechanisms by which time series are or can be 
generated than a more empirical approach. 

There is already a large body of literature, including several books, 
which deals with time series from the point of view of stochastic processes. 
Here we shall give only a short introduction to the subject, including a few 
of the simpler basic results. Further details and results will be found in 
various books, particularly those by Bartlett (1955), Doob (1953), 
Grenander and Rosenblatt (1957) and Wold (1938). 
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17.2 STATIONARY TIME SERIES 

It will be recalled from Section 4.1 that a stochastic process is defined 
as a family of random variables t e T}, such that for every finite set of 
choices ofteT, say ^1* • • • > theseta ;^^,,,, ofrandom variables has a 

joint probability distribution function. 

For time series we may think of t as the time index and T as the entire 
time axis (real axis) or any interval or set on the time axis. Thus, in some 
cases T might consist of only a sequence of equally spaced points along 
the time axis. For a given time unit, we shall be particularly interested in 
the case where T =...,— 1, 0, +1,..., and samples consisting of finite 
blocks of Tsuch as 1,. . ., wor 1,... , A/orw — /i,« — A + 1,..., n — 1 
or 1, . . . , 2A: 4- 1. 

An arbitrary stochastic process T), where t represents a time 

index, is entirely too general to discuss usefully. We shall confine ourselves 
to what are called stationary processes or stationary time series. There are 
two important types which we shall consider. 

A strictly stationary time series {x^; t € T} has the property that any 
finite set 

(17.2.1) 

of random variables from the family t g T) has the same joint distri¬ 
bution function as the set 

(17.2.2) • • • > 

for any h. (It should be pointed out that can be complex or a k- 
dimensional random variable although we shall be concerned almost 
entirely with the case where is a one-dimensional random variable.) 
Thus, the joint distributions of the sets of random variables (17.2.2) for 
different values of h are all identical and depend only on the time differences 

(17.2.3) ^2 ^3 ^2> • • • > In—V 

The reader can readily verify that 

17.2.1 If {x^\ t G T) is a strictly stationary time series with Al^tl) < 
then ^{x^ ^ c, a constant for all t. 

The following is an illustrative example of a strictly stationary time 
series. 

Example. Consider the (real) time series 

r 

= 2 «J. cos (/to, + «,), / = ..., -1,0, +1,... 

p-i 


(17.2.4) X, 
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where , a,, are real constants, , cu,. are constants on the interval 

[—ir, +ir] which we may take as ordered < • • < and ..., are 
independent random variables, each having a rectangular distribution on the 
interval [—w, +w]. Take any finite set of values of /, say /i,..., 4 . We then 
have 

r 

(17.2.5) = TapCOs(/{<<», + 1 ;,), 

* p-i 

where the Vp are independent random variables uniformly distributed on the 
intervals [hcDp ±ir],p ^ ^r. It is evident that the joint distribution of 

..., J is identical with that of ..., 

Thus, the time series (17.2.4) is strictly stationary. 

It should be noted that a natural complex version of the time series (17.2.4) is 

(17.2.6) a;, « 2 V, /»..., -1,0. +1,... 

j»*i 

which, of course, is also strictly stationary. 

Some of the most useful studies of time series are those which do not 
require the assumption of strict stationarity but are based on the weaker 
assumptions that: (i) for all /, ^{x^) is a constant which may be taken as 0, 
and (ii) the distributions in (17.2.2) have the same covariance matrix for 
all h. A time series {x^;te T} satisfying these two conditions is said to be 
weakly stationary. This means that the covariance matrix depends only 
on the time differences (17.2,3) and hence the covariance of and x^ 
(taking S{x^ = 0) will be a function of h only, namelv 

(17.2.7) ~ Yh’ 

The covariance considered as a function of h is called the covariance 
function of the time series {x^\teT}; is sometimes called the lag 
covariance or auto-covariance with lag h. Note that the correlation 
coefficient between x^j^^ and x^ is yJyQ = say. Considered as a function 
of hy is called the autocorrelation function or the serial correlation 
function of the time series. 

If is complex, its covariance function y^^ is defined by 

(17.2.8) y* = 

where is the complex conjugate of x^. 

The reader will readily see that: 

17.2.2 A strictly stationary time series with finite covariance matrix is 
also weakly stationary. 

A model for some time series {y<: tsT) would be of the form 

(17.2.9) + x^ 
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where is a constant for each t and x, is such that {r,: t e T} is a stationary 

time series with = 0. Hence 

(17.2.10) = nit 

and the covariance function of {yi'.te T) is identical with that of 
{*j: I 6 T), that is (in the real case), 

(17.2.11) n == <^(y<+A - '«t+ft)(y» - "»«) = 

One of the problems of time series is to estimate or or to test 
hypotheses concerning or from observations on a finite number n 
of random variables taken from the time series, where, of course, n may 
be allowed to -> oo. 

17.3 THE SPECTRAL FUNCTION OF A STATIONARY 
TIME SERIES 


(a) A Special Case 

As far as the first two moments of a stationary time series are concerned 
the time series is described by the covariance function yf^. We shall show 
that the covariance function itself can be expressed in terms of what is 
called the spectral distribution function of the time series. 

First let us consider the relationship between the covariance function 
and the spectral distribution function of the special stationary time series 
given by (17.2.4). It is readily verified that for all t we have 

(17.3.1) S{xt) = 2 cos (ICO, + «,) = 0. 

P = 1 

For the covariance function y^ = we have 

r 

(17.3.2) = 2 cos (sto^ + Wp) cos (fw, + «,). 

But it is seen that ^ cos {soJj, + Wp) cos (tw^ + w„) = 0, ^ p. Using the 
fact that {or q ^ p 

cos (.yojp + Mp) cos (top + i cos ((s + t)(Oj, + 2Wp) + i cos (s — 0n>p 
it is seen that 

cos (5C0p + Mp) cos (top + «p) =s i cos {S — t)(Oj,. 

Therefore, putting t — s h, y/c have 

(17.3.3) cos (h(o,) = f "'cos (h(o) dF((o) 

2p=l 



518 


MATHEMATICAL STATISTICS 


where F{(o) is the nondecreasing step function of co on [—rr, +7r] defined 
as 

(17.3.4) Fico) = \ l al. 

At the end points, of [—tt, +^J, we have F(—-tt) = 0 and F(+7r) = yQ. 
F(co) is called the spectral distribution function associated with the covariance 
function = ..., — 1,0, +1,..., in (17.3.3). The expression at the 
extreme right of (17.3.3) is also referred to as the spectral representation 
of 

In the case of complex time series, (17.2.6), it can be verified by fairly 
simple analysis that the covariance function is 

(17.3.5) VK = \i dF{oi), 

2p=l J-n 

where F(w) is identically the same spectral distribution as that in (17.3.3). 
(b) The General Case 

Now for a given time unit suppose we have an arbitrary real stationary 
time series 

(17.3.6) x^, / = ...,-1,0,+1,... 

with = 0 and covariance function In this case we have y_j, = y^^. 
For any integer M, let F^(a)) be defined for any co on [—tt, H-tt] as follows: 

1 r® 

(17.3.7) F i/(co) =- S{x-^ cos CO + • • • + X cos M(o)^ doj 

ttM J-n 

j r<o M 

= - 2 cos pco COSTCO dco 

TrAf J-jr v,Q = l 

= _L( y _ [ sin (p + q )(0 . sin (p - q)(»1 
TtM = 1 ” L 2(p + q) 2(p -q) j 

jj = i 2p 2 f 

is nondecreasing on [—tt, +7r] and furthermore Fm{—tt) = 0, 
^ji/(+^) = yo* As M 00, Fij(w) has as its limit a nondecreasing 
function F(co) on [—it, it] such that F(—it) = 0 and F(+it) = yg. (For 
existence and uniqueness of such a limit function F(a)) see discussion in 
proof of 5.4.1.) Note that dF„((o) and dF(a}) are symmetric about w = 0. 
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Since = y*, it is sufficient to consider h to be zero or a positive 
integer. We shall show that for A = 0, 1, 2,... 

(17.3.8) y* = I cos hoj dF(oj). 

J —n 


Consider a sequence of integers A/g,_For A = 0,1,..., we have 

r+jT I r+jT M, 

I cos hco (IF =- I 2 yfl-Q cos po) cosqo) cos hm doj 

J-v " * j- n P,Q = 1 


= “77 (^h + ^h) 
7rM„ 


where 


and 


r + n M, 

~ I ^ Tp-q cos pco cosqo) cos hoj d(o 

—n j) / V = 1 

r + ir.W, 

“ yo I 2 cos hoj dct). 

J~7r p-1 


Using the fact that 

cos pco cos qco cos hco = i[cos (/? + ^ + h)co + cos (/? + — A)ct> 

+ cos (/? — ^ + h)co + cos (p — q — h)co] 
cos^ pco cos hco = i[cos (2/7 + h)co + cos (2/? — h)co + 2 cos hco] 


and that 

y D-q 

yq-p 

we find 

- 

- h), h -7^0 


= 0 

II 

0 

and 


/i#0 



h = 0 


where(5^ = J[1 + (~in 
Therefore, for A = 0, 1,..., A/, we have 


(17.3.9) £ cos h(o dF^^iu)) = h 0 

= yo, h = 0. 

If we let M,-* 00 , we obtain (17.3.8) for any non-negative integer h. 
Now every convergent subsequence of {^^^(tt))}, and hence the sequence 
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{Fj^ip})) itself, yields the same result. Since dF{oj) and also cos ha) are 
both symmetric in eo about co = 0 we can write (17.3.8) in the form 

(17.3.10) y;, = 2 I ""cos ho) dF((o). 

F((o) is called the spectral distribution function of the stationary time 
series x^, If we normalize F{a}) by dividing by then the ratio F{co')lyQ 
is the fraction of spectral mass produced by all frequencies co < co'. 
If F(a>) is absolutely continuous with derivative/(w), then/(co) is called 
the spectral density function of the time series. Summarizing, we have the 
following result: 

17.3.1 / = ...,— 1, 0, +1,... w a real stationary time series with 

S'(x^ = 0, and covariance function /i = 0, 1, 2,. . . then y^^ is 
given by (17.3.10) where the spectral distribution function F(ft>) is 
lim Fj^i{a))^ and where F^iioj) is given by (17.3.7). 

Af—00 

For a complex stationary time series / = .. ., — 1,0, -fl,. . . with 
covariance function y^^ = /i = •••,— 1, 0, 4- ,the theorem 

corresponding to 17.3.1, is due to Herglotz (1911), and states that 

(17.3.11) vn = I' e'*” dFUo) 

J-n 

where 

(17.3.12) F(co) = lim Fj^i{(o) 

M-*oo 

and 

(17.3.13) Fj,i(a,) = ^ J_“ <^( f d<o. 

The quantity under the integral is seen to be real and non-nega¬ 
tive. 

The proof of the theorem for the complex case is similar to that of 
17.3.1 and is left to the reader. 

Results (17.3.10) and (17.3.11) for real and complex stationary time 
series are general enough for most stationary time series problems that 
occur in physical processes since the time unit can always be chosen to fit 
the problem. However, an extension of (17.3.11) to the case where the 
range of h is the real (time) axis rather than multiples of the given time unit 
has been established by Bochner (1932). His result states that if, for a 
complex stationary time scries {a:^: — cx) < / < +oo}, is continuous 
at A s 0 then 
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In the case of a real time series the corresponding generalization of 
(17.3.10) is that 


(17.3.15) 




= 2 I cos ho) dF(o)). 


(c) White Noise 

If a real stationary time series x^, r = .. ., — 1, 0, +1,. . . , has the 
constant spectral density function f((o) = yolln on [—tt, + 77 ] then it will 
be seen from (17.3.8) that = 0 for all h ^0. Conversely, suppose 
y^i = 0 for all h ^ 0. Then we have 


(17.3.16) 


1 * 

•f -IT 


cos ho) f (o)) dco. = 0, h ^ 0 
= yo. ^ = 0- 


If/(oj) is symmetric around ro = 0, and can be represented as a Fourier 
series bQ + bi cos oj + Z >2 cos 2fo + * * * we have 


(17.3.17) 


/: 


cos hoj f((o) d(o 




+ bi cos (o 4- bi cos 2(w H-] cos h(o d(o, 

/j = 0, 1,.... 


But (17.3.17) reduces to 


(17.3.18) 


J*' 

•f —w 


COS hco f((o) do) 




cos^ h(o do). 


The left-hand side vanishes for A 0 as stated in (17.3.16) and hence 
= 0 for A 0. Therefore,/(a>) = Aq, and it follows from the fact that 


j: 


/(w) day = 70 , 
that bg = yollrr. Therefore on [—tt, +7 r] 


(17.3.19) 

Thus, 




17.3.3 If a real stationary time series / = ..., —1, 0, +1,, has 
a symmetric spectral density function /(co), representable as a 
Fourier series of form b^ -h Aj cos (o + b^ cos 2a) + •• • on 
[—77, +77], a necessary and sufficient condition for the covariance 
function to vanish for all h^Q is that f (co) be constant on 
[—77, +77], in which case = yo/277. 
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A Stationary time series r = ..., — 1,0, +1,..., in which every 
pair has zero correlation is called a white noise. 17.3.3 thus 

states a necessary and sufficient condition for a time series to be a white 
noise for a certain class of spectral density functions. 

17.4 ESTIMATION OF MEAN AND COVARIANCE FUNCTION 
OF A STATIONARY TIME SERIES 

(a) Estimation of the Mean 

Suppose f = ..., — 1, 0, +1,..., is a real stationary time series 
with = /i, and that we wish to estimate // from the finite sample 
(Xi,..., xj. The mean x = (x^ + • * • + x„)/« is an unbiased estimator 
for since 

(17.4.1) 

The variance of x is given 

(17.4.2) <^{x) = - [yo + 2"l (l - y \. 

ML ^=i\ nJ J 

If F(p) has a derivative / (m) which is symmetric about co = 0, con¬ 
tinuous at CO = 0, and can be represented by the Fourier series 

(17.4.3) — (yo + 2yi cos co + 2^2 cos 2(o + * * *) 

Itt 

on [—TT, +7r], it can be verified that as w -> oo the quantity in [ ] in 
(17.4.2) has the value 27r/(0) as its limit. Therefore, for large n we have, 
under these conditions, 

(17.4.4) <t\x) ^ 'hJM . 

n 

(b) Estimation of the Covariance 
If we let our estimator for be 

(17.4.5) Cft = —i— 2 (*{ - »)(*{+» - *). 

n — ft 1=1 

we find after some algebraic manipulation that 

(17.4.6) AO = n - As) + ^R 

n 

where 

(17.4.7) 
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It can be shown that 0 as « oo; hence for large n the asymptotic 
bias in as an estimator for is approximately — or*(^). If the spectral 
distribution has a density function /(co) continuous at co = 0, then we 
have for large n 

(17.4.8) 

n 

It should be noted that if fi is known and if x is replaced by jx in (17.4.5), 
the mean value of the resulting covariance is exactly y^. 

The problem of determining the variance of involves finiteness 
assumptions about fourth order moments of the time . series 
r = . . ., — 1, 0, -f 1,. . . and some involved algebra. However, an 
approximation can be obtained if / = .. . , — 1, 0, +1, . .. is strictly 
stationary. Consider the time series = (x^ — tx){x^^f^ “ iw), r = . . . , 
— 1, 0, +1, . . . which is strictly stationary if x^ is. The mean of this time 
series is Let its covariance function exist and be denoted by Then 

(17.4.9) rl, = 

Now it can be shown that for large n the variance of Cf^ is approximated 
by that of c*, where 

(17.4.10) ct = — 

n — h§ = i 

The variance of c* has the same structure as that of a\x) and is given by 

(17.4.11) Ac?)-[yw + 2 ”! '(1 - -^)yA,fl. 

n — hL 1=1 \ n — h/ J 

The series converges if the time series has a spectral density function, 
and for large n 

ylo{^ - -) 

(17.4.12) ^ 

n 

where it will be recalled that 

yt.o ~ ^[(^t ““ /^)P* 

17.5 ESTIMATION OF SPECTRAL DISTRIBUTION 

Suppose a stationary time series / = ..., — I, 0, +1,... has mean 
[x and spectral density function / (cu). We shall consider the problem of 
estimating/ (co) from a sample ... jXj of the time series, assuming fx 
is known. 
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In view of the definition of F(<o) as the limit of as 3/ -► oo where 

Fj)t(o}) is defined in (17.3.7), a quantity which suggests itself as an estimator 
for f(co) based on (*1 .»„) isfJcS) defined as 

(17.5.1) /„(a>) = — (yi cos <w + y* cos 2ct) + ... + cos mco)® 

nn 

where = x^ — (i. Taking mean values and summing some simple 
trigonometric series we find 

c -.X t \ 1 I r, . sin nn> cos (n + l)a)'l 

(17.5.2) «?/„(£«)) = — yo 1 +-^- — 

27r I L n sm a> J 

+2 2 V,r (1 - CO, to+ 

^*1 L\ n/ nsinco Ji 

Allowing CO and making use of the Fourier expansion of /(ft>) given 
by (17.4.3) it is evident that for any value of co in [—tt, +7 r] except for 
o) = 0, 

lim dy„(co) = /(tt)), whereas lim = 2/(0). 

n—►« n-^'oo 

Thus, /n(co) is an asymptotically unbiased estimator for / (to) for co in 
[—TT, +7r] except at CO = 0. On the other hand it can be shown under 
niild conations that the variance of /n(tt>) does not converge to 0 as 
/I 00 . In particular, if (x ^,,.., a:„) is a normal w-dimensional random 
variable, /„(tt>), except for a constant multiplier, has a chi-square distri¬ 
bution with one degree of freedom for all values of n. 

Hence, the obvious estimator/„(tt>) for/ (co) is not a consistent estimator. 
Thus, some other estimator for /(co) must be sought. Various consistent 
estimators for /(co) have been suggested and treated by Bartlett (1950), 
Grenander and Rosenblatt (1953), Jenkins and Priestly (1957), Parzen 
(1957), Tukey (1949a), and others. Most of these procedures approach 
the problem by estimating/(co) at a fixed point co on its interval [—tt, -t-7r]. 
Tukey’s method, however, is to consider the problem of estimating spectral 
masses within subintervals of [— tt, +7r]. We shall consider a method 
similar to Tukey’s and make no attempt to discuss the other estimators. 
The reader interested in them should refer to the books by Bartlett (1950) 
and by Grenander and Rosenblatt (1957). 

We shall assume that for the time series from which the sample 
(a^i,..., 0 ? J comes, there exists a spectral density function / (co) which 
can be represented by the Fourier expansion (17.4.3). Now for a given 
positive integer m, we shall proceed to estimate from the sample 
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(*i,..., xj, the spectral masses 


(17.5.3) A, = f 


contained in the intervals 

(17.5.4) 

\m 2m 


(PIL + 

—] 

-f(^- 



2m/ 

\m 

2m/ 


~ + — > p = 0, ±1, ±2,.. ., ±w. 
m 2mJ 


where F{(o) is the spectral distribution function. 

The number m is chosen so as to balance the uncertainties of attempting 
to estimate too many spectral masses against the need to study the spectrum 
with sufficient resolution. This means roughly that we select m as the 
number of covariances to give a “reasonable’’ description of the covariance 
function of the time series. 

The spectral mass can be expressed in terms of/(co) as follows: 


(17.5.5) 



Using the Fourier expansion (17.4.3) for /(w) and performing the 
integration, we obtain 


(17.5.6) 



J. ^ / sin {hirjlm) 
= hnllm 


) 


cos 


hprr 


m 


Now if it is considered that the information in the time series is 
“reasonably” represented by the covariances yi,.., y^y 1^®^ Ihe 
spectral masses A,, /? = 0, ±1,..., ±m may be approximated by 
truncating the infinite series in (17.5.6) at A = m, thus giving the approxi¬ 
mate spectral masses 
(17.5.7) 



^ / sin (A7r/2m) 
m^i\ hirllm 


) 


cos 


hpTT 


m 


P = 0, ±1.±m. 


If p is known for the time series, and if a sample (x^,..., x„) is taken, 
the estimator which suggests itself for A„ is where 


(17.5.8) 


A + 

2m ^ 


1 (si 

- 1 c*l- 

rrih^i \ 


sin (A7r/2m) \ 
hnllm / 


cos 


hpn 

m 


1 

where T A = 0,1,..., n — 1, m < n — 1, and y^ =* 

n — n 

— (jL, It is evident that since ^(O = A == 0,1,..., we have 
(17.5.9) «f(A,J = A„ 

that is, A.„ is an unbiased estimator for A,. 
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We remark that the spectral mass estimator is slightly different 

^ ^ t X sin (A7r/2/«) . , j ^ 

from the one suggested by Tukey (1949^). If — j-— -is replaced by 

hiTl2fn 

the approximation (0.46 cos (hTrIm) + 0.54) and if an end point correction 
is made by taking only half of the last term in the sum in (17.5.8), Tukey’s 
estimate of A, is obtained. 

Since is an estimator for the spectral mass in the interval pnlm ± 
7r/2/w, an unbiased estimator for the average spectral density 

/p(co), over the interval is obtained by dividing the spectral mass esti¬ 
mator by Ibe interval length tt/zw, that is, 


(17.5.10) = 

TT 

m ~ 

It can be shown that the variance of the estimator — A in (17.5.10) 
for large n and large m (where m = 0(/i)) is ^ 


(17.5.11) 



;-/?(«>) i (■ 

n A=-m\ 


sin (h7r/2m)\^ 


hnllm 


-J 


cos" 


hpir 

m 


If the mean p is unknown for the time series we replace p by the sample 
mean x in defining and use these modified forms of C;^, A = 0, 1 ,..., m 
in (17.5.8) to obtain the estimator for the spectral mass It can be 
shown that (17.5.9) and (17.5.11) are still valid for large n for these 
changes in Cj^. 


17.6 STATISTICAL TESTS FOR PARAMETRIC TIME SERIES 

A full treatment of the various statistical inference problems concerning 
stationary time series is beyond the scope of this book. Here we shall 
discuss estimation and statistical testing only in the cases of several 
classical parametric time series. The reader interested in the treatment of 
further problems is referred to the books by Bartlett (1955) and Grenander 
and Rosenblatt (1957). 

(a) The Variate Difference Method 

Suppose the time series a;^, r = ...,— 1,0, -f 1,... is known to be of 
the form 

(17.6.1) ~ 2 “f" “t" 

D«»0 

where Pi, ,.., are unknown, and where is a white noise with 
variance a*. 



Sec. 17.6 


TIME SERIES 


527 


If k is known, then for a sample ..., n> k + \, minimum 
variance estimators for the /3’s may be obtained by least squares and the 
variances of the estimators may be obtained by the methods discussed in 
Section 10.3(c). The same procedure applies if each is replaced by a 
polynomial ^^(0 of degree/?, /? = 0, 1,. .., fc, where ^i(0, • • -yghif) 
are functionally independent. 

However, if k is unknown in (17.6.1), one way of estimating it is by a 
semiempirical procedure known as the variate difference method which 
works as follows. Let ^ be a time series defined by the Mh forward 
difference of the time series Then 

(17.6.2) ^ = AVi i- A'^e^ 

where 

(17.6.3) -+ (-1)% 


with a similar expression for A Now suppose we consider the sequence 
of samples (Xy ,. . ., h = 1,2 ,... and form the ratios 


(17 


• 6 - 4 ) Qk = i fc")]’ 2 . • • 

If we take the mean value of we obtain 


Am 


(17.6.5) = 

{=1 

Since is a white noise and /Uj is a nonrandom function of t, we find 

(17.6.6) AAVi + = (AV|)* + 


But 

(17.6.7) (“)<-"■ 


Substituting from (17.6.7) into (17.6.6) and then into (17.6.5), we find 

(17.6.8) SQ^ + h = 1, 2,. .., 

where )i. = ii0>,)=/(“). 

It will be noted that Rf^ is non-negative and vanishes for all A > k + 1. 
Thus, = a^, for all A > A + 1. 

Thus, in a practical situation, one would calculate Q^yh^ 1,2 ,... and 
choose as the estimate of k the value of A at which appears to become 
constant (except for “small” random fluctuations) and use Qj^ itself as an 
estimate of a*. The reader interested in a full discussion of the variate 
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difference method, its various modified forms, and its applications should 
consult Tintner (1940) and Quenouille (1948) on the subject. 


(b) Trigonometric Time Series with Known Coefficients and Periods 

Suppose the time series — 1, 0, +1, has the following 

periodic parametric form 

k 

(17.6.9) = 2 cos + bj, sin 

where is a white noise with known variance bp, and cOp are known 
(real) constants, and 0 < (Op < n. 

Now consider a sample , icj from the time series. If we multiply 

(17.6.9) by cos cof, 0 < co < tt, sum over f and divide by w, we obtain 


(17.6.10) 

where 


(7.6.11) 


k 


a(fu) = 2 + b^Vj,((oy] + S((o) 

p=i 


1 " 

(X(oj) = - 2 cos cui 
n 1 = 1 

1 ” 

~ 

n f=i 
1 ^ 
n f=i 


1 ” 

^(oi) = - cos (oi, 
n ^=1 ' 


Similarly, if we multiply (17.6.9) by sin wf, 0 < r/j < tt, sum over f and 
divide by n, we obtain 

(17.6.12) a'(co) = i lapU'pim) + + (5'(o>), 

p=i 

where (x'(w), u^((o), Vp(co), d'ioj) are obtained from a(w), Up(oj), Vp{oj), d(co) 
respectively, by replacing cos by sin coi. 

Note that Up(co), Vp(co), u^(o)), Vp((o) can be written as follows: 

1 ” 

Up(co) = — y [cos ((Op + C0)| + cos ((Op — co)f] 

2n 1=1 

1 ^ 

Vp((o) = — 2 [sjn + sin (cOp — cu)f] 

2/1 ^=1 

1 ^ 

Zn 

1 ” 

u;(co) = — [-cos (cOp + (o)S - cos ((Op - a))f] 

2/1 f*i 


(17.6.13) 
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from which it can be verified that as n -► oo 

(17.6.14) <^[a(a>)] ->0, o) # p = 1,..., fc 



Similarly, as « C30 

(17.6.15) ^[a'(<w)] 0, CD ^ p = 1,.. . , fc 



Thus, if 27r/co^ is a genuine period of the time series a; / = ...,—1,0, +1,... 
then a(Wp) will tend to be near ajl and ol{cd^ will tend to be near in 
large samples. 

If the time series is given and if the periods Inlcoj,, p = 1,..., fc, are 
known, but the coefficients p 1 ,..., A: are unknown, then it 

follows from Section 10.3(c) that minimum variance linear estimators for 
these coefficients can be found by least squares. 

(c) Trigonometric Time Series with Unknown Parameters—Periodogram 
Analysis 

If the time series is of the form (17.6.9) where the Oj,, bj, and and 
even k, are unknown Schuster (1898) has proposed a method of searching 
for possible periods and Fisher (1929) has proposed a method of testing 
the significance of suspected periods. We shall discuss this method which 
is known as periodogram analysis. 

The behavior of the mean values of the functions a(a>) and a'(o>) as 
described earlier suggests that a(o>) and a'(co) considered as functions of od 
should be useful in screening out true periods if any exist. If we assume 
there are no true periods at all, that is, if the and ftp are all zero, then 
is a white noise. If n is odd, say n = 2k + 1, let 

(17.6.16) 

2/c + 1 

Now consider a(wp), a'(ft>p), p = 1,..., fc which are all linear functions 
of x^. The covariance of a(tOp) and a((o,),p ^ q, is 

(17.6.17) 

^2 2*: + ! 

cov(a(cOp), ql{co^)) = y 2 cWpf cos (oj 
(2k + 1)^ 1=1 

« -2 2Jlr + l 

= 2 (2fc + If 
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But 

2 cos (o), ± 0>,)S 

f-1 


real part of 



Since (eo, db a>,)(2A; + 1) is an integral multiple of In it is evident that the 
quantity in [ ] in the preceding expression vanishes. 

Hence 


cov (a(«),), «(<«,)) =» 0, 

In a similar manner it follows that 

cov (a'(«»,), a'(a»,)) = 0, 
and 




cov(a'(«>p), *(<w,)) =0, p,q = I,... ,k. 
Furthermore for p = q, (17.6.17) yields the variance of a(o)p). 


<j®(a(ft)_)) =- — - 

’ 2(2fc + 1) 


In a similar manner we find 

= 




2(2k'+ 1 ) 


Summarizing, we have the following result: 

17.6.1 Ifxi ,..., 072*4.1 are independent random variables having mean 0 and 
variance o*, then a(cOp), /^ == 1,..., A: where w given by 

(17.6.16) are random variables having zero means, zero covariances, 
and variances <r^/(4fc + 2). 


We are now in a position to examine the sample ..., x^^ for 
possible periods of lengths 27r/cop, /? = 1,..., 2fc + 1. If 27r/cOp is a 
genuine period then we know from the behavior of a(a>p) and a'(cop) that 
both of these will tend to have values away from 0 for large n 2k + 1). 
This, in turn, means that a*(cOp) + a'*(cOj,) will tend to have a significantly 
large positive value if 27r/(Wp is a genuine period. But, for the given 
sample (x ^,..., A^v) + has a value for each co,,, 

=s 1, ..., fc. So we need a way of testing whether the largest one (or 
the mth largest) of the quantities A^p) + a'®(^j»)> /?=*!,...,*, is 
significantly large under the assumption that {:r J is a white noise. Another 
complication, of course, is that is usually unknown. The quantity 

(17.6.18) J«^i(«)) - [aV) + «»] 

2n 

is called the periodogram of (a^,..., z^j). 
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To deal with the problem of significance testing, we make the further 
assumption that {ar J is normal white noise, that is, ..,, are inde¬ 
pendent and all have the distribution iV(0, o’). Then 

, , V2(2fc + 1) „ , _ , 

-a (tOp), p = l,...,k 

o 

are 7k independent random variables having normal distributions N(0, 1). 
Let 

(17.6.19) «, = \ ^ [«*(o>p) + «'Vp)] = ^ fi»+i(ft>,), 

a* o 

p = 1. k. 

Then u-i, ... ,u^ are independent random variables having probability 
element 

(17.6.20) e-(u.+•••+«») du^ 

if > 0, /» = 1,..., Ar, and 0 otherwise. The problem of whethfcr the 
largest (or the mth largest) of the quantities a’(Wj,) -|- o'*(£o,) (or 
is significantly large reduces to testing whether the largest or Ui, is 
significantly large. Since o’ is unknown, the test which suggests itself is 
whether the largest (or vhe wth largest) of the ratios 


V2(2fc + 1) 


(17.6.21) --. p = 1. k 

“i + ■ ■ ■ + “t 

is significantly large. This ratio, it should be noted, does not depend on o*. 

Let z), be the rth smallest of Mi, ..., w*, r = 1,..., k. Then Vi,..., o*, 
0 < Oi < Og < • • • < < -H 00, are the order statistics of «i, ..., «* and 

their probability element is 


(17.6.22) 
Let 

(17.6.23) 


k! dvi • • • dv^. 


g = 


+-1- P* 


Then g is identically the same random variable as the largest of the ratios 
in (17.6.21). We shall obtain the distribution function of g by first finding 
the kth moment of g using a simple but effective device described below. 
It will be noted that the Ath moment of g is given by 


(17.6.24) /(g») 


■f -f 

•/-OO •'“CX 


- 
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where 91 * (zi + * * * + z«). But, we find by integrating with respect to 
that 

(17.6.25) 

Ja 


Putting this expression in (17.6.24), expanding (1 — pej. 

forming the integration, and then integrating with respect to z^,..., z^ 
we obtain 

(17.6.26) «f(g») = k(k - 1) l\-l)'’(^ “ 

^ ,=0 \ P / (p + 1 )»+T(i» + k) 

= I w*(l — w)*”^ dw. 

Jo 


But 


(17.6.27) 

r(h + i)r(fc - 

r(/i + k) 

Thus, we can 

write 

(17.6.28) 

k-i /, 


Putting wKp + 1 ) s= y, (17.6.28) reduces to 
(17.6.29) 

I 

^(g*) = k{k - DiV!)”('' 7 0 /[I - (p + i)y]*"*dy. 

31 = 0 \ P / •'0 

So if we put 


0 p(») = [1 - (p + i)yf~\ 0<y< 


p +1 


then 


= 0 , 


otherwise 


(17.6.30) ^(g*) = k(k - 1) f' 2\-!)'('' VW dy 

which holds for A = 1 , 2 ,_ 

Thus, it follows from 5.5.1a that g and y have the same distribution. 
The p.d.f. of g is therefore 

(17.6.31) /(g) dg - k(k - 1)2(- 1)"(* 7 W dg. 

The probability that g > g' is readily found by integrating/(g) dg to be 

(17.6.32) ?(g > g') » i^(-l)'(p li)i^-(P + 1)?T 
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where r is the largest integer <k — \ for which 1 — (r + l)g' > 0. 
This result was originally obtained by Fisher (1929) by a different method. 
Thus, for a given k and a given a the value of g, for which P(g > g«) *= « 
would be the critical value of g for the 100a % level of significance. Davis 
(1941) has tabulated P(g>gJ for gjc = 0.10(0.10)10.0, k = 10(10)70; 
and for gJc = 5.1(0.1)10.1, k = 80(10)160(20)300. 

Summarizing, we have the following result: 

17 . 6.2 Suppose ..., are independent random variables having the 
distribution NiO, a^) and u^,p= \,... ,k, are defined as in 
(17.6.19), where a(cop) and a'(tt>p) are defined in (17.6.11) and 
(17.6.12). Then, if g — max {u„}l(ui + • • • + «*), the probability 
element of g is given by (17.6.31) while P(g > g') for any g' 
on (0, 1) is given by (17.6.32). 


It can be shown by methods similar to these used above that if g is 
defined as the mth largest of ..., u^ divided by Wj + • • • + then 


(17.6.33) 


P(g > g') = 


k\ ^ (-l)»’-”«(l -pgy-i 
(m — 1)! j>=m p(k — p)l {p — m)! 


where r is the largest integer for which 1 — rg' > 0. 

Finally, it should be noted that in the preceding discussion it has been 
assumed that the mean of the a:, is 0. Now if the mean is unknown, 
approximations to the preceding results, for large n at least, can be 
obtained by replacing a:^ by Xj — a, | = 1,..., 2fc + 1. 


17.7 TESTING A NORMAL NOISE FOR WHITENESS 

Suppose we have a sample (x^,..., x„) from a normal time series {x J 
and that we wish to test the hypotheses that the time series is a white noise, 
that is, that x^,..., x„ are independent random variables having identical 
normal distributions N{ix, <r*). A criterion which has been proposed by 
R. L. Anderson (1942) for testing this hypothesis is the ratio 

(17.7.1) R = ^ 

ci 

n 

where = 2 (*f — — »). A = 0,1 

and x„+i s Xy Making c[ circular by putting x„+i = x^ as contrasted with 
running the summation from r » 1 to n — 1, even if it looks arbitrary, 
makes for considerable simplicity in dealing with the distribution theory 
of R. If the sample comes from a white noise then for large n, R tends to 
have vaUies near 0; if not, R tends to have values away from 0. 
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Now c[ and Cg are quadratic forms in x„. The difference 

c[ — R'c'o where R' is constant, is also a quadratic in , x„. We can 
solve the distribution problem of R by determining the distribution of the 
quadratic c[ — R'cq since 

(17.7.2) P(c; - R'c'q > 0) * P{R > R!). 

If (Xi ,..., a:„) is from normal white noise, the characteristic function of 
c[ — R'c'q is given by 

(17.7.3) 

9<0 = (^) J^exp - R’c’o)t - ^ 2(^1 - iJ)^dx^ • • • dx„. 

Using methods and results of Section 8.4 and noting that c[ — R'cq is also 
a quadratic in (x^ — /i),..., (a;„ — ja) we find 

(17.7.4) (fit) = if [1 - 2lt(d^ - R')T* 

1 = 1 

where ~ cos (27rf/n), | = l,...,/i — 1. Since we can write 

(17.7.5) cp(t) = n'b - 2itid, - 

1=1 

where for immediate convenience n is assumed to be odd. The probability 
density function of c[ — R'cq = y, say, is provided by 5.1.2, that is, 

(17.7.6) e->'^fit)dt. 

Z7T •/ — 00 


Now <p{z) has simple poles at the points = —\iHd^ — R'), f = 1,..., 
(n — l)/2, in the complex plane. Thus, using the method of residues to 
evaluate the integral in (17.7.6) we obtain 


(17.7.7) 
where 

(17.7.8) 


g{y) 


i 2 ^lexp 

d^>R' 


—-—1 

2(dj - K')l 


, (d| - R')‘<"-« 

IT (d,-d,) 


Since P{R > R') = P(y > 0), we find, therefore, that 


(17.7.9) P{R > R') - ) “g(y) dy - 2 A^id^ - R') 

•'0 df>R' 

from which critical values of R' can be obtained for given levels of signifi¬ 
cance. The method has been extended by R. L. Anderson to the case of 



Sec. 17.8 


TIME SERIES 


535 


even values of n. He has provided tables of for which JP(/{ > /{«) a, 
for a = 0.99, 0.95, 0.05, 0.01, and n = 5(1)15(5)75. Further studies of 
serial correlation problems have been made by T. W. Anderson (1948), 
Dixon (1944), Durbin and Watson (1951), Hannan (1955), Koopmans 
(1942), Moran (1948), Ogawara (1951), Quenouille (1948), Whittle (1951), 
and others, 

17.8 LINEAR PREDICTION IN TIME SERIES 

Suppose we have a time series / = • • • — 2, — 1, 0,. .., with spectral 
density function / (co) and wish to predict linearly by least squares from 
the entire past history of the series. A mathematical question which arises 
is this: What are conditions under which the least squares predictor for 
is nondeterministic, that is, has positive mean square error as against 
conditions under which the predictor is deterministic, that is, has 0 mean 
square error? The answer to this question has been given by Kolmogorov 
(1941) and Wiener (1949). We shall discuss the problem briefly by using 
elementary methods of analysis. 

Suppose Ai n is the predictor for 0 ?^ based on the history x_n ,..., x_^, x^. 
Then is given by 

(17.8.1) = 

n 

where t ...,—1^0 are the values of „ which minimize 

(17.8.2) s(x^ 

Using the method of least squares of Section 3.8 we find 

(17.8.3) 

where ||cr^^ |1 = and ||<T{|'1| is the (n + 1) x (m + 1) matrix 

7o yi Yn 

Vx Yo • • • Yn-X 

(17.8.4) .... 

Yn Yn-X Yo 

and where it will be recalled that 

Yh = r-fc = (* cos hoifiw) doi. 


(17.8.5) 
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If we let denote detenninant of ||(r;{'||, with being similarly 
defined, then the mean square error of the predictor „ given in (17.8.1) is 

(17.8.6) ^(*i-^i.„)* = ^- 

^n +1 

Now we have for f = 1,..., w, 


(17.8.7) 
and hence 

(17.8.8) 


^n + 2 + i ^ ^ ^n + 2-i 

^n+l+^ ^n+1 ^n+l-l 



from which we obtain 


(17.8.9) 


\a„,*/ a„,i \^J - 


Ta,king logarithms and denoting we find 


(17.8.10) A„< log al<B„ 
where 

1 2n+2 t n + 2 

" 2 ^^8 ®^2fi + 2-2 ^^8 ®^n + 2 

n ^=1 

(17.8.11) 

- 2 lo 8 ®|.n+l ~ “ log yo 
n n 

where (fli, 2 n+ 2 > • • • > ® 2 n+ 2 . 2 n+ 2 ) ^^e the latent roots of the matrix whose 
determinant is A 2 „+ 2 > with similar definitions for ( 6 i,n+ 2 > • • • > 0 n+ 2 .n+ 2 ) 
and (fii n+i ,..., ®n+2,n-i-i)- Since all three matrices are positive definite 
with diagonal elements all equal to ^q, all roots in each of the three sets lie 
on the interval ( 0 , j/q) and hence the logarithms of the roots in each set 
lie on (- 00 , log yo)- 

If the average of the logarithms of the roots of matrix (17.8.4) has a 
finite limit /T, as n -► oo, then it is evident that 

lim An = lim = K, 

n-*co n-*ao 

Hence, if we let 

(17.8.12) <T* = lim <^(*1 - «!,„)* 

n-»oo 

we have 

(17.8.13) o* = e*. 

If ilT is not finite, its value is — oo, in which case — 0, and „ has a 
mean square error for predicting which ->■ 0 as n -*■ oo. 
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It can be shown, further, by more advanced methods of analysis, which 
will not be given here, that if K is finite its value is given by 

(17.8.14) K = log In + — \ log/(co) d(o, 

2rr J-w 

The reader interested in the proof is referred to Kolmogorov (1941) 
and Wiener (1949). Proofs can also bu found in the books by Doob (1953), 
and by Grenander and Rosenblatt (1957). 

Another problem involving linear prediction in time series is the 
autoregressive scheme which we shall mention briefly. An autoregressive 
scheme of order r for a given time series (a; J states that for constants 
ai,..., a,., the time series 

(17.8.15) -- 

is a white noise. Least squares estimates for . .., can be found 
from a sample .. ., in the usual way. Dixon (1944) has considered 
the problem of testing the hypothesis that Aj, . .., are all zero against 
the alternative that a^, ..., a,, are 0, using the method of likelihood 

ratios and assuming is a normal white noise. Other studies of auto¬ 
regressive problems have been considered by Mann and Wald (1943). 
Wold (1938) has considered the problem of testing the hypothesis that a 
time series {x^} is of form y\ + + • * * + where {y^ is a white 

noise. 

PROBLEMS 

17.1 Suppose a stationary time series = ..., —1,0, +1,... has 
spectral density function 

/(o>) = ~ (tt — |a>|), —xr < CO <TT. 

Show that the covariance function is as follows 

1 A =0 

(i) A = odd 

0 h = even. 

17.2 (Slutzky's (1937) theorem) If /*..., —1,0, +1,... is a real 
stationary time series with spectral density function /(a>), show that the smoothed 
time series 

k 

vt “ --1,0, +1 ,... 

j >—0 

where the a’s are real constants is a stationary process which has spectral 
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k k 1 

2 2 cos (/> - ?)“ /(“)• 

. ff « 0 j >-0 J 

In particular, if Uj, » ll{k + 1),/? « 0,1,..., A: that is, if the smoothing process 
is to take a moving average of A: + 1 elements of the time series at a time, show 
that the spectral density function of the resulting time series is 

1 - cos (k -f l)(o V 
(k + 1)*(1 - cos a>) ■' ^ ' 

17.3 {Continuation) If the stationary time series r = ..., — 1,0, +1,... 
is repeatedly smoothed r times by the formula given in the preceding problem, 
show that the resulting time series is stationary with spectral density function 

r A; * “Ir 

2 2 cos {p -q)<o\ f(oi). 

17.4 Suppose / = ..., —1,0, + 1 ,... is a white noise with unit variance. 
Show that the time series generated by taking the wth (forward) difference of this 
time series has covariance function 




(- 1 )* 


( 2 / 1)1 

(n — h)\{n + h)\ 


0 


—n<,h<n, 

otherwise. 


17.5 Suppose 
series such that 


., “1,0, +1 ,... is a linear process^ that is, a time 


+ 00 

** _ p^p 

—00 


, 00 

where the a’s are real numbers such that ^ < + oo, and the e’s are inde- 

PSM—CO 

pendent random variables having zero means and unit variances. 

Show that the spectral density function / (a>) of this time series is given by 

J +00 +00 

/(«>)-=- 2 2 a^p+qCOsqo. 

— x P— — oo 


17.6 If Xt,t * ..., —1,0, +1,... is a stationary time series with ^{x^) * 0, 
show that its spectral distribution F(co) is given by lim Fj^{a}) where 


1 r*“ 


and also by lim F^{o>) where 

M-*co 


4'(xi sin 0 / + 


M-*ao 

4 - x^ sin A/ct»)* dm 


FSr{(o) 


1 


^Im{^) dm 
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where Im{^) is the periodogram defined by 




1 

IttA/ 


M 

p=i 


|2 


17.7 If ajj, r =s ..., —1,0, +1 ,... is a complex stationary time series, 
show that the covariance function 


Vh ^ A)* ^ — • • • » ““!» 0, +1, . . . 

is given by (17.3.11), where F{(u) = lim and Fjf(a>) is given by (17.3.13). 

M-*ao 

17.8 Let a;^, / = ..., — 1, 0, +1,... be a stationary time series whose 
covariance function is given by 

Vh = yo/. 0 < p < +1, A = 0,1, — 

Show that if is the variance of the least squares linear estimate of % from 

®_„+i.»o. then (T* = yo(l - p®). 

17.9 {Continuation). Show that the spectral density function of the time 
series in Problem 17.8 is given by 

.. V ^ yo(l ~pCOSa>) 

7r(l + /)* — 2p COS o>) 

17.10 Establish formula (17.6.33) by methods similar to those used in 
developing formula (17.6.32). 



CHAPTER 18 


Multivariate Statistical Theory 


A branch of mathematical statistics that has been quite thoroughly 
developed during the last quarter of a century is linear and quadratic 
an..lysis of samples from multidimensional distributions. This branch has 
come to be known as multivariate statistical analysis. Most of the sampling 
theory and statistical tests that have been developed for problems in this 
field are for samples from multivariate normal distributions. There is 
a large body of results on multivariate statistical theory scattered through¬ 
out the literature of mathematical statistics. In this type of book it is not 
feasible to cover all the important and interesting results of this literature. 
We shall therefore confine ourselves to the main ideas and principal 
results of multivariate statistical theory. The reader interested in further 
mathematical details should consult the recent books by Anderson (1958) 
and Roy (1957) on this subject. Details concerning applications of these 
methods to numerical problems can be found in books by Rao (1952) and 
Kendall (1957). Anderson’s book has a comprehensive bibliography on 
multivariate statistical analysis. 

18,1 MULTIDIMENSIONAL STATISTICAL SCATTER 

By way of introduction to multivariate statistical analysis it will be 
convenient to define and discuss what we shall call the scatter of a sample 
about a point. 

(a) The One-Dimensional Case 

First of all, suppose ... jX^) is a. sample from a one-dimensional 
c.d.f. F(x). Let Xq be an arbitrary real number. Note that for any given 
values of a?!,..., x^ and Xq they can be represented as points along the real 

540 
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line. Then the scatter of the sample about the pivotal point x, is 
defined as follows: 

(18.1.1) - i = iS*.n + n{x - x^)\ 

Note that is merely the sum of squares of the n possible segments 
(^1 — ^o) generated by taking Xq and each x^ in the sample. Note also 
that the scatter of the sample about Xq is equal to the scatter of the sample 
about its mean x plus the weighted scatter of x about Xq, the weight being 
the sample size w. Thus, the scatter of a sample is a minimum when taken 
about the sample mean. We also know from Section 8.2 that if F{x) has 
mean p and variance then 

(18.1.2) = „[<7* + (/* - xo)*] 

from which it is evident that the value of ^ is a minimum if Xq is 
chosen as p. 

In the language of mechanics, is the moment of inertia of the 
sample points ,,, ,x^ about the pivotal point x^. 

Now suppose we have two samples ..., x^^^) and ..., 
from one-dimensional distributions Ff^x) and Ff^x), respectively. Let 
^<i)^ ^< 2 ) be the two sample means, x the mean of both samples pooled 
together, i5^(i) Ihe scatters of the samples about their respective 

means as pivotal points, and the scatter of the pooled sample 

about X as the pivotal point. By pooled sample we mean the sample of 
size n^ -H n^ obtained by regarding the two samples as one single (grand) 
sample. The reader can verify that 

(18.1.3) + n2(*'*' - *)*• 

Thus, the scatter of the pooled sample about x has as its minimum value 
under rigid translations of the elements in each sample, the sum of the 
scatters of the two samples about their respective means, this minimum 
occurring only when the two sample means coincide. The result expressed 
by (18.1.3) extends immediately to any number of samples. 

(b) The Two-Dimensional Case 

Now let us consider a sample (x^, f = 1,..., n) of size n from a 
two-dimensional c.d.f. Fix^, Xj), and let (x^, *») be an arbitrary pair of 
numbers. Note that for a given sample, the sample, together with (x^,, Xm), 
can be represented as a cluster of points (at most n + 1) in a plane. It will 
be convenient to call this a sample cluster. Take any two elements (pairs) 
in the sample, say {x^, x^f) and (xj,, x^), and the point (x^, x^. These 
three pairs can in general be represented by three points in an x^x^-plane. 
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Unless the three points are collinear, a fourth point can be chosen in three 
different ways so that the three original points and any one of the choices 
for the fourth point form the vertices of a parallelogram. The three possible 
parallelograms all have the same area (except possibly for sign). It 
will be convenient to call the absolute value of this area the two-dimensional 
content determined by (x^q, x^). The squared value of 

this content is 

^Iti ^10 ^2ir ^20 


(18.1.4) = 


Squaring the determinant on the right, we find that it may be written as the 
sum of four determinants, namely, 


(18.1.5) 
where 

(18.1.6) 


>4* 


= + A^„ + A„f + A, 






^lo)* (*li| ““ ^lo)(*2if ^2o) 

(^2f ^2o)(*l| ^lo) (^2ij 


with similar definitions for A^^, and Note that A^^ and are 
both 0. , . 

The pair (f, rj) can be any two of the integers 1,...»n. There are 1 21 

such pairs, and the sum of over all such choices is 2 

since = 0. But this sum is equivalent to the sum of A^^ (or of 

V V 

for f, = 1, . .., n, that is, 2 2 Denoting this sum by n 

»7-l $-1 ... ®’ 

and noting that for any set of determinants 
the relation 




, f, = 1 ,..., n, 


(18.1.7) 

holds; we therefore have the result 
1 1 (*if - »io)* 



/I h 


2«r 

2*, 

12 

0, 

= 



tl 1 

d. 



2d, 


( 18 . 1 . 8 ) 


!;(*««- * 2 oK*« - »ie) 


2 (*lit “ ®loX*Si| “ 

n 

2 (* 2 , - *«.)* 


where ^ and ^ each denote summation over values 



Sec. 18.1 MULTIVARIATE STATISTICAL THEORY 543 

The matrix whose determinant is is sometimes called a Gramian 
matrix, that is, a matrix product A'A where A is the matrix 

*11 - *10 *21 - *20 
(18.1.9) ; ; 

*ln - *10 *2» - *20 

and A' is the transpose of A. 

The quantity is the scatter of the sample (x^f, xgf. f = 1 ,..., n) 
about the pivotal point it is the sum of squares of the two- 

dimensional contents determined by x^) and all possible pairs of points 

in the sample x^^, | = 1,..., n). The matrix whose determinant is 
2 ^x ,n called the scatter matrix of the sample about x^. 

_ 1 « 

If we let a;,. = - ^ / = 1, 2 and 

n 


(18.1.10) u„ = = 2 - *<X*« - *i). = 1.2 

we find that 

n 

(18.1.11) ~ ^ii “1“ 

Therefore, we have 

(18.1.12) 

*^11 “1“ W(*l ^lo)* *^12 “1“ ^(^1 ““ *lo)(^2 ®2o) 

U 21 + ^(X2 — ^2o)(®l ^10) *^22 “1" ^(^2 ^20)* 



which is the two-dimensional analogue of (18.1.1). Writing the determin¬ 
ant on the right as the sum of four determinants in the usual way, we find 
that 2 ^x 0 ,n expressed as follows: 


(18.1.13) = 



where 1 |m‘^|| = ||w«ir^, this inverse matrix existing, of course, only if 
the scatter matrix of the sample about (xi, Xg), is nonsingular with prob¬ 
ability 1. It will be convenient to call ||w<^|| the internal scatter matrix 
and its determinant |w<y| the internal scatter of the sample. It will be noted 
that is nonsingular if and only if the n points (a?i^, f = 1 ,..., n) 
in the sample are noncollinear. Thus, if is nonsingular, the quadratic 
form inside [ ] on the right of (18.1.13) is positive definite, in which case 
it is evident that 
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U.1.1 The choice of the pivotal point (x^q, which minimizes the scatter 

of the sanqtle about (xjg, is the sample mean (%, x^, in which 

case the minimum scatter is the internal scatter of the sample 
given by 

(18.1.14) 8S*,„ = 

Next we shall show that 


'll 

Mis 

'21 

Mss 


18.1.2 The mean value of the two-dimensional scatter about (ar^Q, ajgo) is 
given by 


(18.1.15) 



^11 + (/^i ^lo)* 

^21 + (/^2 *” ^ 2 o )( A ^1 ^ lo ) 


^12 + ““ ^ lo )(/^2 ““ ^ 2 o ) 

0^22 + (/^2 ““ ^ 2 o)* 


where ||(Tj^|| is the covariance matrix and (//j, is the vector of 
means of x^. The minimum of J occurs if and only if 
(«io. « 2 o) = i“ 2 )» w ^hich case = 



k«|. 


Formula (18.1.15) is the two-dimensional analogue of (18.1.2). To 
establish this result we first write down the expression defining ^(A^^) 
namely, 

(18.1.16) <^^(A^,) = f f A^, dF(ari^, Xg^) dF(a:i,, ar2„)- 

JR% JRt 

Noting that (18.1.7) holds for integrals as well as finite sums, we may there¬ 
fore write the quantity on the right of (18.1.16) as a determinant | Z>,^| where 

D21 == I (i*?2| ^2o)(^i^^10) ^2^) 

J iRt 

with similar expressions for and Z> 22 . Thus, we have 

Dif =s iF{Xi — ““ ^io) ~ "1" (j^i — ““ ^io)* 

Therefore, for 1 9 ^ 17 =« 1,..., n 

<^11 + 0^1 * 10 )* ^12 + 0^1 ^ 20 ) 

On+ (lH- *ioX^ - »io) - »»)* 

But - /(A„) + /(A^) + /(A,,) + ^(A^) - 2^(Af,). 
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Furthermore, ^ consists of the sum of all possible the number 
of these being j. Hence 

^( 2 S«..n) = 2 ( 5 )^(Aj,) 

which is seen to be equal to the right-hand side of (18.1.15), thereby com¬ 
pleting the argument for 18.1.2. 

Note that if ||a,y|| is positive dehnite then ||ct*^ || = ||( 7 ,,||“^ exists and is 
also positive definite and we can write 

(18.1.17) J = 2 (") |a,,| [l . 

Thus, it is evident from (18.1.17) that 

18.1.3 If llo’ij is positive definite^ tittains its minimum If (ar^Q, x^q) 

is chosen as the mean (//j, of F{x^, in which case this minimum 

is 

(18.1.18) ^(,S,.„) = = 2(^) |(r,,| 

where 

n 

(18.1.19) Vn = v„ = X (*« - i«<)(^« - /«i). i,; = 1, 2. 

«=i 

Now suppose (a;^>, 4V.. fi = 1,.... Wi) and (a:5|>,, = 1. • • •. "s) 

are two samples from the c.d.f.’s Fiix^, x^, F^x^, x^, respectively. Let 
and (xi^\*^®) be the means of the samples and and 

II i.y» = 1 . 2 , the internal scatter matrices of the two samples. 
If these samples, when pooled, have mean {x^, x^ it is straightforward to 
verify that the scatter of the pooled samples about (xi, * 2 ) is given by 

(18.1.20) 

2S*,„,+„, = |w{}’ + + «i(xl^> - *,)(*</> - *,) + n^xl’‘> - S<)(*f> - *y)(. 

The second-order determinant on the right of (18.1.20) is the two- 
dimensional analogue of the right-hand side of (18.1.3). 

(c) The it-Dimensional Case 

One encounters no new kinds of difficulties in defining the scatter 
of a sample (x ^^,..., Xj^^, f = 1 ,..., n) from a fc-dimeiisional c.d.f. 
F(xi,... ,Xj^) about a pivotal point (x^q, ..,, x^^q). The sample and pivotal 
point can be represented as n -h 1 points in Euclidean space /?*, which will 
be called a sample cluster. Thus for each choice of k different sample 
points, say ..., Xj^^)y / = 1,..., fc, together with the pivotal point 
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(xjo,..., there axek + I ways to choose a further point so that this 
point together with the A: + 1 points already mentioned form a A:-dimen- 
sional parallelotope. These k + 1 different possible parallelotopes have 
equal A:-dimensional volumes (except possibly for sign), the absolute value 
of which would be called the k-dimensional content determined by 

.I = 1,..., A:, and , Xj^). The scatter is then 

defined as the sum of squares of the A:-dimensional contents determined 

(^io> • • • > **o) ®ftch of j different possible choices of 

I = 1,..., A:. It is straightforward to show that is the extension of 
as given by (18.1.11), to k dimensions, that is, 

(18.1.21) k^Xf^,n “ \^ii "i" ^io)l 

where the definition of /,y= 1,...,A:, is evident from (18.1.10). 
If ||u^J, the internal scatter matrix of the sample, is positive definite, 
which will occur if and only if the n sample points (a:;^^,..., f = 1 ,...,«) 
do not lie in a flat space of less than A:-dimensions, then n can 
be written in the alternative form 

(18.1.22) = |m,,| • [l + n 2 - a:,.o)(*^ - z^)], 

where, of course, and is positive definite if ||m^J is. We 

note that (18.1.22) is the A:-dimensional extension of (18.1.13). Since the 
quadratic form inside [ ] is positive definite, the Ar-dimensional version 
of 18.1.i readily follows, that is, has its minimum value if (ar^Q,..., a;;^,) 
is chosen as the vector of sample means (^j,..., Xj^), in which case the 
minimum value of where 

(18.1.23) ij^l,...,* 

which, of course, is the internal scatter of the sample. 

It will be noted that ^ determinant of the Gramian matrix 
A^A where >4 is the /i x A: matrix 


(a^ll “ ^lo) ’ * ‘ (^Jbl ^ko) 


II (®lw ““ *lo) * ‘ ‘ (^kn ^ko) II 

and A' is the transpose of A. Also is the Gramian matrix 
where J9 is ^4 with x^q replaced byx^^i^ 1 ,..., A: and B' is the transpose 
of A 
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If the sample ..., f 1,..., n) is from a Ir-dimensional 
distribution whose vector of means is (jii,... and whose (nonsingular) 
covariance matrix is then following an argument similar to that by 
which (18.1.IS) was established, it will be found that 

(18.1.24) ^GSx..„) = k!(") • |(T„ + Ou, - a:,„)C«, - x^)\ 
which can be written in the alternative form 

(18.1.25) ^(»S...„) = • [l i - *«))]• 

This is the A:-dimensional analogue of (18.1.17) from which the fc-dimen- 
sional form of 18.1.2 follows, that is, the minimum of occurs if 

and only if ...»is chosen as the vector of means (yUj,.. ., 
this minimum being having the value 

(18.1.26) <^(A,„) = = k!(y |a,,| 

where 

(18.1.27) Vii = = 2 (^i( - - n,). 

5 = 1 

The determinant |cr,^| is sometimes called the generalized variance of 
/'(a?!,. . ., It can also be regarded as the internal scatter of the 
distribution having c.d.f. F{x^y . .., 

The reader who has followed the details of establishing (18.1.20) will 
have very little difficulty in seeing that the A:-dimensional version of (18.1.20), 
that is, the scatter *5^ of two samples pooled together, about the mean 
of the pooled samples is given by (18.1.20) with Uj = 1,..., A:. For a 
detailed discussion of multidimensional statistical scatter the reader is 
referred to Wiljcs (1960a). 

18.2 THE WISHART DISTRIBUTION 

(a) Derivation of the Wishart Distribution 

In the preceding section we have seen that the mean value of the scatter 
of a sample from a ^-dimensional distribution is relatively easy to deter¬ 
mine for samples from an arbitrary distribution having finite means and 
covariance matrix. However, the problem of determining higher moments 
of a scatter seems to be very difficult for samples from such distributions, 
except in the case of sampling from a A:-dimensional normal distribution. 
As a matter of fact, we can find not only the moments of the scatter, 
but the distribution function of the elements of the scatter matrix for 
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it-dimensional samples from a A:-dimensional normal distribution if the 
pivotal point , Xj^ is chosen as the mean of the normal distri¬ 

bution. The distribution of the elements of the scatter matrix in this case 
is a remarkable result known as the Wishart (1928) distribution which is 
fundamental in the theory of multivariate statistical inference thus far 
developed. We shall give a derivation of the Wishart distribution based 
on the use of characteristic functions that will be seen to be valid for every 
point in the space of {v^^} for which is positive definite, provided ||or<J 

is positive definite and k < n. 

Let ..., f = 1,..., n) be a sample from the A:-dimensional 
normal distribution Iktill) whose p.d.f. is given in (7.4.1). A 

sample of size n from a A:-dimensional distribution is defined in Section 8.1. 
Consider the scatter ^ of this sample about the population mean 
(/Ml,..., /M*). We have 

(18.2.1) *5,,, = It;,,I 
where 

n 

(18.2.2) (a;.j - - (i,), i,; = 1. k. 

i-i 

If we denote by 93({/«}), /,/ s t^, the characteristic function of 
{%, i = 1.fc; 2Vij, i> j — 1,..., A:} we have 

(18.2.3) 

= 

Since 9>(0) = 1 we note that the integral in (18.2.3) has the value 
[(2w)»*/V|ff‘^ - 2»7„|]». Hence, 

(18.2.4) 9>({t«}) = • It,, - 

where 

(18.2.5) i,j = 1 ,.... fc. 

Applying the multidimensional version of Levy’s theorem, namely, 5.2.1, 

the probability density function of / = 1,..., Ar; 2u„, f >y = 1. k} 

is given by 

(18.2.6) 

/({»«. 2o„})= (:p) ' J expl-/ 2 H 

Mtt/ JR\kiJk+D L -J 

I.et T„ =* 0,,. Then 0,, — t„, dt^j — / dS^jy and 

rrn+ioo r fc 1 k 

(18.2.7). /({»«. 2 »„}) = A exp 2 «iAi 'IT 

Jr«-ioo J 


[?2^] J **** [” 2 ^ ^ 

L(27r)*'^J JRnk L 2 <. 7=1 I .>=1 Jlf 
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(18.2.8) 


A = 



(2^,)*»(»+!) 


We shall perform the integration iteratively as follows: First with respect 
to then with respect to ..., next with respect to 
then with respect to ..., 0 ]t- 2 ,k-i so on through k such cycles. 
We may write (18.2.7) as follows: 


(18.2.9) 

where 


(18.2.10) 


/({««, 2t.„}) = ^H, 


J V^fc + ioo 

Tilt — i 00 
*■= 1 ,. • • 


exp 



Jk-1 





+ i 00 r — 1 

1 X 

H,= 

exp 2 

• ie«l(X*-”) n d6,, 


Jrij-iao Lt,i = l 

J t>; = l 


r r*fc + r 00 

Jfc-1 

H,= 

exp (ut*0«fe) • 




*,J=1 


and where 

Now in /fg let 
(18.2.11) 




011 • 





ij = l 


W. 


Then, we have 


k-l "I Tc + io 

(18.2.12) H, = exp u,, ^ * 

iti = l J vc-iao 


w dw 


where c is a real and positive constant whose value we need not write down. 
The integral in (18.2.12) is basically a Hankel integral (see the example in 
Section 5.1) and has a value given by 


(18.2.13) 


j ivj”-\27ri) 

' r(in) 


Inserting the value of//g in (18.2.9), we obtain 
(18.2.14) /({p«. 2v,,}) = A -H, -1, -G 

where 
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Making the transformation = lyt^, i = 1. k — 1 in G, we find 

(18.2.16) 

r+ta+iut r *-i *-i 1*“^ 

C = (/)*-^ exp 2 +2/2 n ^fuc- 

+ •- *.^ = 1 * = 1 

»*»1. k — 1 

It will be noted that the integral in (18.2.16) is an integral of a function of 
(/:—!) complex variables taken over the entire (A: — 1) dimensional space 
of the real parts of these complex variables for fixed values of the 
imaginary parts. It can be shown that the integral does not depend on 
what fixed (finite) values of the imaginary parts are chosen. Hence, the 
fixed values may be chosen as 0. Thus, the integral has the same value as if 
Via:* • • • * Va^-i.a: 'vcre real, with the integral taken over the (k — l)-dimen- 
sional space of ..., fc. The value of this integral can be written 
down by referring to (7.4.13) and (7.4.17). Thus 


(18.2.17) G = A • exp I - 2 0,,^ I 

L <,,=1 vj,j, J 

where 

Putting this value of G in (18.2.14) we may now write 

(18.2.19) 

rni+ioo r*-i 1 »-i 

/({*^ii’2Uj^}) =/l(/i7i) I exp 2 n 

Jrij-iao L»,> = 1 J »>; = 1 

<>J = 1.A:-l 

where 

(18.2.20) , /,;• = 1,..., k - 1. 


- ‘.^=1 ” ffc* - 


i>i = l .A:-i 


y r = Vri — 


i,j = 1,..., k — 1. 


It will now be seen that the integral appearing in (18.2.19) has the same 
structure as that appearing in (18.2.7) except that k, n, are replaced by 
k — 1, « — 1, i?}]’ respectively. 

Therefore, by successively repeating cycles of integration similar to that 
already performed, we obtain 

(18.2.21) f(v„, 2u,,}) = • • • (V*) 

where and Ji are given by (18.2.13) and (18.2.18) and 




(t>[*i-i))»("-*^^>-i(2ir/) 

r(!U:l±i) 




(18.2.22) 



Sec. 18.2 


MULTIVARIATE STATISTICAL THEORY 


551 


and where is given in (18.2.20) and 


„«) = 


vr, = v)y - 


„(« ,,<i) 

,( 1 ) _ 


„<i) 


i,J .fe - 2 


(18.2.23) 




- ‘’ll “ "jra • 

I'M 


Substituting these values of /j,..., /*, J^,... ,J^ into (18.2.21), we find 

(18.2.24) 

• • ■ i;‘'5-i>]‘‘"-*-i’ exp X I'oi'o] 

L ij=^i J 


f({Vii,2vn}) = • 


2U(fc-l)^fc(fc-l)/4p 


at any point in the sample space of 2Vij} in which is positive 
definite and 0 otherwise. But the structure of the elements in the sequence 
4 - 1 ,Jt~ i» • • • > 4*”^^ is similar to that of the sequence ..., 

defined in Section 7.4(a), It therefore follows from (7*4.6) that 


(18.2.25) I’lwfet'i-i.fc-i • • • I’u ” = K\- 

Now the expression on the right of (18.2.24) is the p.d.f. of 
{ii„, / = I,... ,k; 2Vij, i>j= 1,..., A:}. Thus, if g({y„}) is the p.d.f. 
of the {Vij, i> j = 1,..., A:}, we have 

(18.2.26) giM = 2^*<^-^>/({i;,,, 2.,,}). 

Remembering that we finally obtain the following basic 

result due to Wishart (1928): 


18.2.1 If (arjj,..., f = 1,..., n), /: < n, is a sample from the 
ic-dimensional distribution iV({it^<}; if scatter 

matrix of the sample about the population mean (pi ,..., Pfc)> ^ 
defined by (18,2.2), then the elements of ||t;„ || have the p,d. f 

(18.2.27) 

exp T-i 2 

«({«>»})-T-—7--- 

_ k + ij 
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in the \k{k + lydimensional region for which ||»|,|1 is positive 
definite and 0 otherwise. 

It is convenient to say that any matrix ||r 4 ,|| of random variables whose 
elements have the p.d.f. (18.2.27) has the Wishart distribution WQc, n, lla^yll) 
which is characterized by the parameters k, n, and Ho^J. 

The distribution was originally established by Wishart (1928) by a 
method of geometric argument, whereas a slightly different version of the 
geometric argument was presented later by Mahalanobis, Bose, and Roy 
(1937). The distribution has also been derived by various other methods 
by Ingham (1933), Wishartand Bartlett(1932), Madow(1938), Hsu(1939a), 
Sverdrup (1947), and Rasch (1948). The derivation presented above is 
essentially that given by Ingham, Wishart, and Bartlett. 

For the case A: = 1 it should be noted that = l/cr® where a® is the 
variance of the normal distribution from which the sample comes, and the 
Wishart distribution o®) reduces to a distribution in which 

has the chi-square distribution with n degrees of freedom. 

It should be noted that the number of random variables in the sample is 
nk and the number in the Wishart distribution is ^k{k + 1) which means 
that the Wishart distribution is a ik(k -t- l)-dimensional marginal distri¬ 
bution determined from the nA;-dimensional distribution of the sample 
elements. James (1954) has shown that a transformation of the exists 
which transforms the probability element of the sample into the probability 
elements of three independent distributions, one of which is the Wishart 
distribution, another is a distribution concerning the A:-plane spanned 

by the k n-dimensional vectors [(a:,i — /i ^),..., {Xi„ — yu<)], / = 1. k, 

and the third one is the distribution of the orthogonal k x k matrix which 
determines the orientation of these k vectors in the A:-plane. 


(b) Moments and Distribution of the Scatter in Samples from a Normal 
Distribution 


In (18.1.26) we have seen that the mean value of the scatter of a 
sample (x^^ . x^, f » 1. n) about the mean (pi ./<») of the 

distribution from which the sample is drawn is ’ Wu\ where 


is the covariance matrix of the distribution. In the particular case of a 
A;-dimensional normal distribution, the rth moment of \Vij\ can be found 
as follows. 

First, it will be convenient to let 


(18.2.28) g(K}) - K{k, n, {a„}) • MJc, n, {a„}, {u„}) 
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<*i \ 2 / 

and, over the space of the in which J is positive definite. 


(18.2.30) h{k, n, {(T„}, {««}) = exp 


[- 4M 

(18.2.31) .^(|v»n = K(k, n, {a„}) f M^k, n, {a,,}. {r„}) ft dv„. 


and 0 elsewhere. Then 


Now since the integral of J) over the entire sample space of the J 
is unity, we have 

I* h(k, n, {Oit), {«;„}) H dv^, = ^ 

“'Hlt(i+i) i>i=i K{k, n, {aif}) 


(18.2.32) 


But the integrand of (18.2.31) is h(k, n + 2r, {or,,}, {t?„}) and the integral 
of this function over the sample space of the {tr„} is IjKik, n + 2r, {or,,}). 
Therefore 


(18.2.33) 

p^ n + 1 - i 

Putting r = 1 we find for the case of |tr<,| in a sample of size n from 

\\aj) 

^(K\) = 


I,, irx _ f^(k, n, {u,,}) _ p ir" IT 

nil) = ————^ = I* 7„| 11 

K(k, n + 2r, {<t,,}) <=t 


This result, it will be recalled from (18.1.26), holds for the more general 
case of ln„| in a sample of size n from an arbitrary A;-dimensional distri¬ 
bution with covariance matrix |1 <t„||. 

Note that we may write ^(|u„r) as follows: 


(18.2.34) 



(|2a„|zi 


JU 

<=1 





dZi 


from which it is evident that the distribution of |u„| is identical with the 
distribution of |2<r„| (*i • • * **) where the z, are independent random 
variables having gamma distributions GQi(n -1- 1 — /)), i = I,... ,k 
respectively. 
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Summarizing we have the following result: 

18.2.2 If is the scatter of a sample of size n from ||or<J) about 

the population mean ..., the distribution of is 

k 

identical with that of JJ where the are independent 

i-l 

random variables having gamma distributions G{\{n + 1 ~ 0), 
Is? 1,..., respectively. 

For an explicit form of the distribution function of expressed as an 
integral the reader is referred to Wilks (1932). 

(c) Reproductivity of the Wishart Distribution 

The law of reproductivity for the Wishart distribution may be stated 
as follows: 

18.2.3 If and are independent sets of random variables having 

Wishart distributions WQcyn^y Iktill) lk»jll)» ^^e set 

of random variables has the Wishart distribution 

W(ky 111 + «2, llo^J). 


This can be established at once by characteristic functions. For it 
follows from (18.2.4) that the characteristic functions of and 
are 

It,, -/Mol"and IM^”-It,, -/Mol"^"* 

respectively, where and where = 1 and i y. 

Since and are independent, the characteristic function of 
is the product of these two characteristic functions, that is, 

• It,, - -«”!+»>) 

which is the characteristic function of the Wishart distribution 

Wik, «i + Mg, ||(T,J). 

A basic reproductive-like result for Wishart and normal distributions 
which will be useful later may be stated as follows: 

18.2.4 Let {a,,} be a set of random variables having the Wishart distri¬ 
bution Wik, n, ||or«||) and let ib^p,b^p), ^ = I,... ,p, be p 
independent sets of random varieties having identical distributions 
Ikwll). which are also independent of the set {a,,}. Then the 

set of random variables jay + ^ Wishart distribution 

Wik,n-i-p,\\a,,l). 
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The proof 18.2.4 by characteristic functions is similar to that of 18.2.3 
and is left as an exercise for the reader. For p> k 18.2.4 is an immediate 
consequence of 18.2.3. 

18.3 INDEPENDENCE OF MEANS AND INtaNAL 
SCATTER MATRIX IN SAMPLES FROM A:-DIMENSIONAL 
NORMAL DISTRIBUTIONS 

Suppose ..., f = 1,..., /i) is a sample from the A:-dimen- 
sional distribution HcTiyll) and let be the internal scatter matrix 

of the sample. We shall show that: 

18.3.1 The elements of the internal scatter matrix l|w,J and the sample 
means . . ., are independent sets of random variables having 
the distributions W{k,n — 1, respectively. 

To establish this result we shall consider the characteristic function of 
/ = 1,..., A:, 2t)y, and ((Sj - n^), ii^) 

defined as 

(18.3.1) {/,}) = ^|exp i 2 fata + ' “/“<)<«]) 

V L t’ = i J' 

where is the scatter matrix of the sample about the population mean 
(^ 1 , ...,//*) as defined in (18.1.27). It will be noted that the right-hand 
side of (18.3.1) can be expressed as 

(18.3.2) ^^|exp ^ + I 2 ^ ^]|) 

where Evaluating (18.3.2) from results given in Section 7.4, 

we find that 

(18.3.3) <K{ta}, (t,}) = |(r‘f" • kir• exp T-i i a* ^'1 

L »,,=i n J 

where |lcr«|| = Ik^ - 2/r..,|l and |k,*ll = ll<^2ir*- 
Now applying the multidimensional form of Levy’s theorem 5.2.1, 
thep.d.f. of {»«, (Xf - fii)i = I,... ,k; Iv^, i>j = I .A:} is given by 

(18.3.4) (f f exp [-/ i - / i (X, 

' YL ^Ui 
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Making use of results in Section 7.4 and performing the integration with 
respect to the we find first that expression (18.3.4) reduces to 


(18.3.5) /.:^Mexp| 

(27r)” 1 


where 


/1 \te(«!+i) r 


(18.3.6) /- i 

\Z7r/ •'«**(*+1) 

r 


• “PL"'. 

and 



»ii * -U/)-'! “ *<)(»« - »/)• 

f=i 

Thus, by comparing (18.3.6) with (18.2.6) it is seen that I is the p.d.f. of 
the set, of random variables (m<<, i— I,... ,k; lUf,, i>j= I,... ,k) 
where have the the Wishart distribution W{k, n — 1, ||o«||). 

Thus, (18.3.5) is the product of the p.d.f. of the Wishart ^stribution 

W(k,n — 1, ||<Ty|l) and the p.d.f. of the normal distribution W 
thereby establishing 18.3.1. 



18.4 HOTELLING’S GENERALIZED STUDENT DISTRIBUTION 


(a) Case of One Sample 

Suppose we have a sample from a k-dimensional normal distribution 
and we wish to test the hypothesis that the vector of means of the distri¬ 
bution has the value (/i°,..., A test based on scatter analysis which 
suggests itself is to compare the internal scatter ju^^l of the sample with 
the scatter \vn\ of the sample about the specified mean (^j,..., /4|) of the 
distribution. This suggests taking the ratio 


(18.4.1) 
Since 

(18.4.2) 


R =iH£il =_iHiil_ 

^ I "iil l«</ + «(*< - ’ 

l%l = Wn\ • [i + « 2 

L <,/»! J 


we can write R^ in the alternative form 


(18.4.3) 

where 


1 -I- nD»/(n - 1) 


(18.4.4) 1)^ 2^w"(*, - - A*?) 

and llu^H - ||«„||-i. 
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It is evident that lies on [0,1], and is unity if and only if the vector 
sample means ... ,z^ is equal to the vector of population means 
(jil ,..., fil). We assume, of course, that ||u<y|| is positive definite, which 
occurs with probability 1 if « > A:. The quantity D* is known as the 
Mahalanobis (1936) (squared) distance between the sample and population 
or more briefly Mahalanobis’ D®. The larger the value of D® the smaller 
the value of of course. The quantity T® defined by 

(18.4.5) r® =» nD® 

is called Hotelling’s (1931) (squared) generalized Student ratioy or more 
briefly Hotelling’s ' 

It should be remarked that where X is the Neyman-Pearson 

likelihood ratio for testing the composite statistical hypothesis (see Section 
13.3) 3^((i} \ fl) where Q is the admissible set of points in the parameter 
space consisting of all values of i > y = 1,..., fc, for which ||(T^y|| is 
positive definite and all real values of the //whereas co is the subset of £2 
consisting of all points of £2 for which = jLi^.i = 1,.,., A:. The proof 
of this statement is left as an exercise for the reader. 

We now consider the sampling distribution of Ri if J^(a>; £2) is true, 
that is, if the normal distribution from which the sample is drawn has 
the vector of means (//j,. ^, jul), A relatively simple way to do this is by 
first determining the moments of R^, then inferriqg the distribution from 
the moments. Thus we wish to find the value of 

^(R[) = r = 1, 2,.. .. 


Since the elements of have the Wishart distribution W(ky n, ||(t^J), 
it follows that is given by (18.2.33) with r replaced by —r. But 

= Uij + bib^, where b^ = Vn(Xi — /^°), and where the and J 
are independent sets of random variables having distributions W(ky « — 1, 
Ilcr^J) and A^({0}; 11(7^^11) respectively, as we have seen in 18 . 3 . 1 . 

Therefore, we may write 


(18.4.6) K(fc,n-l,K})r}^l‘f f kJ-’-expl'-J i 

L(27r)*J JR\k{k-^\)JRk L J 


k k 


• h(k ,« - 1 , {oti}, (m,^)) n dbi n 

i = l t>i = l 
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where K.{k, n — 1, {<;«}) and A(/c, n — 1, {o^}, {«,,}) are defined in Section 
18 . 2 (h). 

If n is replaced by n + 2r throughout (18.4.6) and if both sides of the 
resulting equation are multiplied by K(k ,« — 1, {(Ty})/ir(Ar, n + 2r — 1, 
{or^}), it will be seen that the left-hand side of the equation is the integral 
which defines and the right-hand side is the value of the integral. 
Therefore, we have 

(18.4.7) 

g(Rr^ ^ \ \2aurm,n-l, {q,,)) ~| ^ \ 2 ) 

^ L K{k, n -H 2r - 1. {<r„}) J W p/ n -H - i \ 


Substituting the values of K(k ^« — 1, {a,j}), Kik, n + 2r — 1, {ffy}) from 
(18.2.29) into (18.4.7), we obtain, after simplification 


(18.4.8) 



which holds for r = 1,2,_ It will be observed that the right-hand 

side of (18.4.8) can be expressed as the rth moment of a beta distribution 
as follows; 



We conclude from 5.5.1a that Ri has the beta distribution Be(i(n — k), ^k), 
that is, the probability element of Ri is 


(18.4.9) 





Summarizing, we have 

18e4.1 ..., f . ,n),n > kyis a sample from the normal 

distribution N{{p!l}\ {(Xi^ll), the ratio of scatters Ri = has 

the beta distribution Be{\(n — k\ \k). 
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Applying the transfomation 


1 + r*/(n - 1) 

to (18.4.9) we obtain, noting that the sample space of r® is (0, +oo), 

r(:) 

(18.4.10) -7-(1 + T*/(ii - l))-*"(T*)“-‘ d(T*) 

(„ - ,)i.r(|)r(!^) 

as the probability element of Hotelling’s 7^. It will be noted that fork = 1, 

(18.4.10) reduces to the probability Element of the ordinary Student 
with n — 1 degrees of freedom. (See Section 7.8(b),) 


(b) Case of Two Samples 


Suppose we have two samples ..., = 1,. .., aij) and 

(^Ujs • • •» f 2 == 1» • • • > ^ 2 ) from normal distributions having identical 

sets of parameters. Denote the common distribution by iV({/^f}; l|o<^ll)- 
Let and \\uip\\ be the internal scatter matrices of the two samples 
and J the internal scatter matrix of the two samples pooled together 
as a single sample. The matrix \\uip + will be called the within- 
sample scatter matrix for the two samples. Geometrically, it is the scatter 
matrix for the A:-dimensional cluster one obtains by rigidly translating 
one sample cluster with respect to the other (without rotation) until the 
means of both samples coincide, and then pooling the two sample clusters 
together as a single cluster. 

Consider the ratio 


(18.4.11) 



We shall show that R[ has a structure and distribution similar to Ri as 
defined in (18.4.3). Let us put 


(18.4.12) uZ = u\y + ulf. 

Then which is the typical element on the right-hand side of (18.1.20), 
can be expressed as follows 

(18.4.13) I/,, = + b,b, 

where 

(18.4.14) /_«^(sa)_j«)). 

V Hi + n, 
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,> _ \uZ\ 

^ iwr^ + btb,\ ■ 

It follows from 18.2.3 that has the Wishart distribution 

W(k, «! + ^2 2, llor^J). Also, since the bi are functions of the means of 

the two samples it follows from 18.3.1 that the are independent of 
the {wj}. As a matter of fact the reader will readily verify that the 
have the normal distribution A^({0}; ||or,J). 

The problem of finding the distribution of R'^ is thus seen to be similar 
to that of finding the distribution of Ri in (18.4.1). In fact, we note, that 
the distribution of R[ is identical with that of Ri with w — 1 replaced by 
Wi + ^2 2. 

We may write 


(18.4.16) 


where 

(18.4.17) 




1 


1 + 


«1»2 


(«1 + « 2)(»1 + Wj - 2 ) 

k 

I 


-D' 




= (ni + - 2) i *<«)(*</> - 

ij = l 

and where ||«y|| = 

The quantity X>'’ is the Mahalanobis (1930, 1936) {squared) generalized 
distance between the two samples. 

Hotelling’s generalized Student ratio for the two-sample problem is 
defined by the relation 

J'* ^ £,/• 

Hi -I- «2 

and the probability element of the distribution of T'* is given by (18.4.10) 
with « — 1 replaced by -b n 2 ~ 2, assuming, of course, that both 
samples are independently drawn from identical Ar-dimensional normal 
distributions. 

Finally, we should remark that where A is the Neyman- 

Pearson likelihood ratio for testing the statistical hypothesis Jf’(a>; f2), 
where Q is the parameter space for which ||<T|j^|| = ||ar)|)|| = ||( 7 ,y|| is 
positive definite, and where the vectors of means {p ^^,and 

{p^^ ./tj?)) are real vectors, while <u is the subspace of Q in which 

the two vectors of means are equal; llo^J^II) and N{{p\^^}; HoJf'H) 
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being used to label the normal distributions from which the two samples 
are drawn. Verification of this is left to the reader. 


18.5 THE MULTIDIMENSIONAL MODEL I 
ANALYSIS OF VARIANCE TEST 


The one-dimensional Model I analysis of variance test in its most 
general form is the test of the statistical hypothesis 3^, the linear hypo¬ 
thesis of normal regression theory, defined in Section 13.3(6). The likeli¬ 
hood ratio A for this hypothesis is of the form [see (13.3.9)] 


(18.5.1) 


= 1*" 
-Sq + (S^ — 


where Sq and — Sq are independent random variables whose distribu¬ 
tions are the one-dimensional Wishart distributions W(l, nti, a^) and 
W(l, Wg, respectively, if is true, where and k\ 

the numbers of degrees of freedom in and — 5^ respectively. As 
pointed out in Section 13.3(i), the likelihood ratio A is equivalent to 
the Snedecor ^ test, where 


(18.5.2) 


^n) 


For testing A is equivalent to A^'", of course, which is the ratio 
inside [ ] in (18.5.1). The numerator and denominator of this ratio arc 
simply one-dimensional scatters with the numerator being the smaller. 

The multidimensional extension of the Model I analysis of variance test 
in its most general form is a generalization of the ratio inside [ ] in 
(18.5.1). The generalization is a ratio of the form 


(18.5.3) 




t 

dii + 2 
0-1 


i,J ^ 1,... ,k, k < m, where the have the Wishart distribution 
W(k, m, llo^yll) and where ..., b^, ^ = I,... ,s, ast independent 
vectors of random variables all having the normal distribution II) 

and are all independ^t of the 

Using the same method for determining the rth moment of JR. as that 
used for finding the rth moment of Ri as given by (18.4.8), we find 


(18.5.4) ^(RD = n 


-F 1 — i 5 -H 1 — 

“ p^ m 1 — i jp^ w -H 5 + 1 — i _j_ 
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Note that if« > Ar, can be expressed as follows: 

( 18 . 5 . 40 ) 

from which it is seen that the distribution of R, is identical with the distri- 

X; 

bution of the product JJ where the are independent random variables 

having the beta distributions 

4 

o /m + 1 — f s\ . , 

®n—5— ’V- .“ 

respectively. 

Similarly, if 1 < 5 < A: can be expressed in the following form: 


r(=4^) r(=Lr|±< + ,)r(D' 


(18.5.46)^W-n -i-r^7.T - 

...I 

from which it is seen that the distribution of R, is identical with that of the 

f 

product XT where the Wt are independent random variables having the 
beta distributions 


a (m — k + i k\ , 

®n—5 —■ih .. 


respectively. 

Summarizing, we have the following result: 

18,5.1 Stq>pose || 0 |^|| is a symmetric matrix, positive definite with proba¬ 
bility 1, whose elements are random variables having the Wishart 
distribution W{k, m, ||<7«^||) and let 


• • • > ^n^)» P ““ 1, 

be s independent k-dimensional random variables all having the 
noimal itatrOmtion JV({0}; ||o<^||) and which are also independent of 
the random variables in ||a<J. Let 
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(i) If s> k, the distribution of R, is identical with the distribution 
of the product of k independent random variables having the 
beta distributions 


„ (m + I — i s\ 


/ = 1,..., fc. 


(ii) If 1 < s <k, the distribution of R, is identical with the 
distribution of the product of s independent random variables 
having the beta distributions 


fm — fe 4- 

I 2 


i = 1.s. 


It will be observed that R^ as defined in (18.4.1) is a special case of R, 
with s=\,m = n—\, a,,. = and b^ = ^/n{x^ — pX Uj = 1,..., *. 
Similarly, R[ in (18.4.11) or (18.4.15) is a special case of R, with j = 1, 
w = Hjl + «2 — 2, a^j — uip + wj?', and 




ij = I,. . .,k. 


An interesting special case of R, occurs for j = 2, in which case VR^ 
has the beta distribution Be{m + 1 — A:, k), that is, the probability 
element of VR 2 is 

(18-5.5) ^ ~ (V^)”’'*(l - 

r(m + l)r(fe) ^ ^ 2 

To verify this we start with the rth moment of R 2 which we find from 
(18.5.4b), to be 

(18.5.6) 


+ .)r(: 


md = 


m + 2 — k 
2 


— fe j p + 1 — fej 


Using the Legendre duplication formula (7.6.15), that is, 

/I o « '7\ i\ _ V’*' 


(18.5.7) 


r(v)r(v + J) = 


we can telescope four pairs of gamma functions in (18.5.6) and obtain 
/I a « ON _ r(m + l)r(m + 1 - fc + 2r) 


(18.5.8) 


r(m + 1 + 2r)r(m + 1 - ik) 
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which may be written as 

W 

r = 1 , 2 ,.... 

It is seen from (18.5.9) and 5.5,1a that the distribution of R 2 is identical 
with that of 2 * where 2 has the beta distribution Be(m + 1 — fc, k). Hence 
V ^2 distribution, that is, it has the probability element given by 

(18.5.5). An expression for the distribution of in the form of an integral 
has been given by Wilks (1932). 

Example. Suppose ..., fy = 1,..., «y), y = 1 , 2, 3, are three 
independent samples from identical normal distributions ||<i,vll)* Let 

y « 1, 2, 3, be the internal scatter matrices of the three samples and let 
\\Uij\\ be the internal scatter matrix of the grand sample obtained by pooling the 
three samples. Let + u{f + = ||w^|| be the within-sample scatter 

matrix for the three samples. Then the ratio 

(18.5.10) 

is the three-sample extension of the ratio R[ in (18.4.15). jRg is a special case of 
Rf in (18.5.3) with s —2 and w — /ij + /I 2 + — 3, and hence the probability 

element of Vi?' is given by the expression (18.5.5) with m replaced by 
+ «2 + «3 — 3. 

It will be noted that the system of random variables has the Wishart 
distribution lV(k,ni -h /I 2 + ^*3 ~ 3, ||<t,^||). Furthermore, the between-sample 
scatter matrix \\ufj\\ is defined by 

(18.5.11) ufj “ «</ - *45 “ 2 

y-l 

i,i « 1,..., k, where (»i, ...,»*.) is the mean of the grand sample. Two 
independent vectors ( 6 i«,..., = 1 , 2 can be found, which are also inde- 

3 

pendent of {i/JJ} where 6,-^ *= 2 s^^h that the components of each vector 

y=l 

have the distribution iV({0}; ||<T<y!l) and such that the right-hand side of (18.5.11) 
2 

is 2 However, it is not necessary to determine these vectors in order to 

^-1 

establish the distribution of R 2 - This distribution can be readily found from its 
moments by a procedure similar to that by which the distribution of Ri was found. 

It can be shown, of course, that Rg is equivalent to the three-sample Neyman- 
Pearson extension of the likelihood ratio described at the end of Section 
18.4(6). 


18.6 PRINCIPAL COMPONENTS 

(a) Eigenvalues and Eigenvectors of a Scatter Matrix 
In this section we shall deal with the following problem: Consider a 
sample of size n from a ^-dimensional distribution, k < n. This sample 
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may be represented geometrically as a sample cluster of n points in a 
X:-dimensional Euclidean space Suppose we wish to project this cluster 
orthogonally onto an .^-dimensional Euclidean space s < ky so as to 
obtain the greatest possible ^-dimensional scatter of the projected points. 
The problem is to determine (i) the direction of projection with respect to 
the coordinate system of Rj^ and (ii) the size of the scatter in 
The solution of this statistical problem can be stated in the following 
result due to Hotelling (1933): 

18.6^1 Suppose ..., Xj^^y f = 1,..., n) is a sample of size n> k 
from a k-dimensional distribution whose covariance matrix is 
positive definite. Let be the internal scatter matrix of this 
sample and let it be positive definite with probability 1. Let 

(18.6.1) (^ij)> • • •» ~ 1,... > .y 

k 

be s k-dimensional unit vectors, that is, such that^cf^ = l,p == 1, 

...ySyandset 

k 

(18.6.2) = 2 p = 1,.. ., s. 

ll^pall internal scatter matrix of the sample {z^^y ..., 2 ,^, 

f = l,...,n). 

The values of the vectors (18.6.1) which maximize the scatter 
\Ujig\ are the solutions of the s sets of equations 

(18.6.3) i (ti,, - /^(5,,)c,, = 0, i = 1,.. ., fe 

p = 1,..., j where /j,..., /„ are the s largest roots of the 
characteristic equation 

\UH - l^ii\ = 0. 

Sif being the Kronecker 3, and where I, < • • • < li with probability 
1. Furthermore, these vectors are orthogonal, and the maximum 
value of |i3lp,l is the product *••/,. 

To prove 18.6.1 consider the following general linear function of the 
components of each element of the sample: 

(18.6.4) = 2 f - 1,. •., 

and form the sum of squares 

(18.6.5) e = i(*i-*)*= 

j f-i M-i • 

where T c? = 1. 
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Now let US find the values of (ci,..., CjJ for which Q is stationary. 
Using the method of Lagrange multipliers, we obtain vectors from the 
equations: 

(18.6.6) = 0, i = 1. k . 

uC^ 

where 

?’ = S + /(l-2c<)- 
The equations (18.6.6) are 

(18.6.7) i («,, - /(5,,)c, = 0, i = 1,. .., fc. 

i=i 

To obtain solutions of (18.6.7) other than the trivial solution = 0, 

1 = 1,..., fc, which, of course, is ruled out by the assumption that 

2 c? = 1, it is necessary for 

t 

(18.6.8) - ld,,\ = 0. 

Since the sample of size n > k is assumed to be from a fc-dimensional 
distribution and since is assumed to be positive definite with pro¬ 
bability 1, (18.6.8) has k positive roots which we assume to be different, 
that is 0 < 4 < • • • < /i with probability 1. These roots are called the 
eigenvalues of the matrix These roots are also called the charac¬ 

teristic roots, or latent roots of the matrix ||w„ ||. 

If 4 . is any one of these roots, then (cip^,..., is the / 7 'th eigenvector 
and satisfies 

(18.6.9) i(««-= 

If we multiply (18.6.9) by and sum over i we get 

(18.6.10) 2 = 0 

<.i=i 

from which it follows that 

(18.6.11) Z,. = X 

ij = l 

Similarly, for any two roots 4 ., 4 ., and /?' ^ q\ we have 

k 

ipi^l 
k 


( 18 . 6 . 12 ) 
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from which we find (subtracting the first equation from the second) 

(18.6.13) (/.<-= 

Therefore, for /?' ^ q' we have 

(18.6.14) = 

t=i 

that is, the eigenvectors (qp,..., c^p) p = 1,... , fc are mutually orthogo¬ 
nal. These vectors are also called characteristic vectors or latent vectors of 

Using the result (18.6.14) in either of the equations (18.6.12), we find, 
that for p* ^ q\ 

(18.6.15) i = 0. 

i.i = l 

Now suppose we choose the s eigenvectors corresponding to the s 
{s < k) largest roots /,<•••< and form the sample (z^^,..., 
f = !,...,«) where 

k 

* = 1 

It follows from the orthogonality of the unit eigenvectors that this sample 
is an orthogonal projection of the original sample onto an j-dimensional 
Euclidean space. The internal scatter of this projected sample is therefore 
\i^p<ilp,q = 1, .. . , where 

n k 

(18.6.16) U„ = 2 (ZpJ - 2 p)(z,i - z,) = 2 

^=1 t,J=l 

But making use of (18.6.11) and (18.6.15) we see that 

(18.6.17) |wJ=/x/2--*/, 

which concludes the proof of 18.6.1. 

The reader should note that if we take s = k, then | Wp^| = = / 1/2 

This is also evident from the fact that the product of the roots of (18.6.8) 
has the* value |w,^|. 

Referring again to (18.6.8) it will be seen that 

(18.6.18) + = 

t=l 

In other words, the sum of the eigenvalues of Hm^^H is equal to the sum of 
the scatters of the individual components of the sample, each taken 
about its own mean. The eigenvalues 4,..., of the internal scatter 
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matrix liu^jll of the sample, each divided by n — 1, are called the principal 
components of the total sample variance by Hotelling (1933), since 

(18.6.19) -h- + • • • + = sf + • • ■ + sj 

n — 1 n — 1 

where =? w,i/(/i — 1), as will be seen from (18.6.18). 

(b) Sampling Theory of Eigenvalues of a Scatter Matrix 

Since the eigenvalues , 4 are the roots of the determinantal 

equation (18.6.8), where ||w,.J is the internal scatter matrix of the sample 
the eigenvalues themselves are random variables. The question arises, 
of course, as to what one can say about the sampling theory of these roots. 
The problem of determining the sampling theory of the roots in exact 
form has been solved only in the case where the sample ..., 
f = !,...,«) essentially comes from a A:-dimensional spherical normal 
distribution, that is, one whose covariance matrix Hcr^J has eigenvalues 
which are all equal. In this section we shall present this sampling theory. 
This problem was originally solved approximately simultaneously by 
Fisher (1939), Girshick (1939), Hsu (1939b), Mood (1951), and Roy (1939). 
Mood’s results were not published until twelve years after they were 
obtained. More recently it has been treated by James (1954), Olkin (1951), 
Olkin and Roy (1954), and others, using different methods. 

The basic result can be expressed as follows: 


18.6.2 Let (a:i|,..., Xj^^, f = 1,..., n) be a sample from the normal 
distribution iV({Mj; ||(y, 7 l|) and let be the internal scatter matrix 

of the sample. The roots 0 < lj^< •••< l^ of the characteristic 
equation 

(18.6.20) \Ui^ - /<T,,.| = 0 


have a distribution with probability element 

( 18 . 6 . 21 ) , - .. . ^ ^ dh 




dL 


in the region for which 0 < 4 < 4-i < * • • < 4 < + oo, and 
0 otherwise. 


To establish 18.6,2 we proceed as follows. We know that has the 
Wishart distribution Wijc, n — 1, Uo^jl). Furthermore, we know from 
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the theory of positive definite quadratic forms (see Birkhoff and MacLane 
(1953), for instance) that we can find a nonsingular linear transformation 

(18.6.22) - i^i) = 2 i = 1 ,..., k, f = 1 ,.. ., n 

i=i 

so that ..., f = 1,..., n) is a sample from iV^({0}; 11^<^1|) where 
\\dij\\ is the unit matrix. Denoting the matrices and Wdi^W by c, 

a and I and the inverses of c and a by c~^ and ct"^, and the transpose of a 
matrix with a dash, this means that 

(18.6.23) co-ic' = ly (O'ac-i = /. 

Hence, if llWj*l| is the scatter matrix of the sample of y’s, then the have 
the Wishart distribution W{kyn — 1, ||5,J). Denoting the matrices 
J and 11^*11 by u and m*, we can express u and u* in terms of each other 
as follows: 

(18.6.24) u = c'u*Cy u* = (c~^yuc~^. 

Now suppose li> '•'> Ik are the roots of (18.6.20), that is, of 

(18.6.25) \u - la\ = 0. 

The probability of equality of two or more roots is zero. Multiplying 
the matrix \\u -- la\\ on the left by (c-^)' and on the right by we obtain 

(18.6.26) (c-i)'wc-r - /(O'ac-i, 

which, in view of the second equations in (18.6.23) and (18.6.24), is 

(18.6.27) u* - 11. 

The determinants of (18.6.26) and (18.6.27) are, of course, identical in 
value. But the determinant of (18.6.26) is |(c“^)1 • \u — h\ • in ^yhich 
thefactor \u — la\ vanishes for / = 4,..., 4 . Therefore |m* — ll\ vanishes 
for these same values of /. 

Hence, the distribution function of the roots of \u — la\ = 0, where 
have the Wishart distribution W{kyn — 1, ||or,J) is identically the 
same as that of the roots of \u* — ll\ = 0, where {m^} have the Wishart 
distribution lV{ky n — 1), ||5i^||). The probability element of is 

exp n du* 

\ i=l = l _ 

~ 


( 18 . 6 . 28 )' 
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From the theory of positive definite quadratic forms, there exists an 
orthogonal matrix ||e^^|| of random variables and a set of positive random 
variables< * ’ • </i such that 

(18.6.29) u* = e'fe 

where e is the matrix ||e^^|l, and/is a diagonal matrix with/i,...as 
its diagonal elements. Also, since e is an orthogonal matrix, we know that 
e'e = /. Thus, if we replace and / in the matrix ||w* — ll\\ by e'fe and 
e'e respectively, we obtain 

(18.6.30) u* - II ^ e'fe - le^e = l)e. 

Taking determinants of the matrices in (18.6.30), the roots 4 < * * * < 
are seen to be equal to/. < • • • </i respectively, since \e\ ^0. Therefore, 
the distribution of the roots 4,..., 4 is identically the same as the 
distribution ofrespectively. To find the distribution of 
we apply the transformation (18.6.29) to the probability element (18.6.28) 
and take the marginal distribution with respect to /i,.. .,/. Since e is 
an orthogonal matrix, the (e^jj are functions of \k{k — 1) parameters 
which we may denote by f = 1,..., \k{k — 1). The Jacobian of the 
transformation is 

^k(k + 1) columns 

3^51) 1 ,,. .. 

ik(k - 1) rows 

krows 

dif,) i 



where i > y = 1,..., A:; /' = 1,..., fc, r = 1,..., ^k{k — 1). 

Now any element in the tih row of the upper ik(k — 1) rows of /is of form 

k 

where depends only on the and their first derivatives with 
i-i 

respect to g,. Furthermore, any element in the lower k rows depends only 
on the {eij}. Hence, it is evident that / is a homogeneous polynomial in 
/f • • • >/ of degree ^k{k — 1), whose coefficients depend on the and 
their derivatives with respect to the {g J. 

The transformation (18.6.29) is not unique if / = / for any i ^ / 
Hence, for every i > j we must have (/ — /)®<^ as a factor of / where 

k 

is a positive integer > 1. Therefore, / must be of form (//)®«G. 

i>i-l 

Now since each is a positive integer > 1, since there are \k{k — 1) 
factors of form (/J —and since / is a homogeneous polynomial in 
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f\ . fit of degree \k{k — 1) it follows that every a« = 1 and that G 

does not depend on the /, but only on the e,, and their first derivatives with 
respect to the ^ (. ’ Thus, G depends only on the gt and may be written 
G({gJ). Therefore we have 

(18.6.32) J=Tl(L 

Now since-»|^,^| = 1, we have from (18.6.29) 


(18.6.33) l<l = k'l • I/I •H=/i •••/*. 

Furthermore, 

(18.6.34) 

i=l *=1 

Making use of the results (18.6.32), (18.6.33), and (18.6.34) we therefore 
obtain the following result after applying the transformation (18.6.29) to 
(18.6.28): 


(18.6.35) 


KG(g,) n dgAUf^ n (/; -/)exp (-hlfAUdfi 

t-l \t = l / t>3 = l \ t = l / t = l 


where A’ is a constant. Since the and {/J are independent sets of 
random variables, the p.e. of the {/J is 

(/k \i(n-fc-2) k / ^ 

(18.6.36) c IT/ n Ui - /) «P - 12 / IT dU 

V \t = l / t>j = l V t = l / /i = l 

where C is a constant to be determined. Denoting the function in { } 
by R{k, n, (/J) we may write 

(18.6.37) 1 = f R{k, n, {/}) df,--- df„ 

C JEk 


where is the region in for which 0 <A </*-!<••• </l < +00. 
Denoting the integral on the right of (18.6.37) by — k — 2)} it will 

be observed that since Im,*!'' = (lT/» ) > write the rth moment 

of|M|||as follows ' 

(n — k — 2 , , 

-:-+ '•1 


(18.6.38) 




(n — k — 2 , \ 

j—+ 7 
’’* 1 — 
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But since the system has the Wishart distribution Wife, n — I, 
whose p.e. is (18.6.28), the value of the rth moment of (u^l is given 
by (18.2.33) with ||ory|| = ||dy|| and n replaced by n — 1. Therefore, 

y> (” + 

(18.6.39) \ \ = 2^‘n / ■/ • 

- r(l^) 

If we put r = it will be seen that 

( k + 2 — A 

l-ikin-k-2) k 2 / 

n 


(18.6.40) C 


But 


(n-k-2\ 


<pM 


... • 


(18.6.41) 9*(0) = f n a -/<) exp (-ii/,) d/i • • • df,. 

Making the transformation/^ =/;,/»_i =/i_i +/»',... ,/i =/J +/i, 
we find that 


9 >*( 0 ) = 7 
k 


(18.6.42) 

But we see from (18.6.40), by setting « = A: + 4, that 




(18.6.43) 2s(0) = 2-*n- , 

9’»(1) Mp/ fc + 4-A (fc + 1)! 

It follows from (18.6.42) and (18.6.43) that 

9.*(l) = (k + l)!|^,_i(l). 

k 

Replacing k successively by A; — 1, A: — 2,..., 2 and noting that 911 ( 1 ) = 4, 
we find that 

(18.6.44) 9^1) = 2*(fc + 1)! U^k + l- 0. 
which, when substituted in (18.6.43), gives 

(18.6.45) 9.»(0) = 2» n r(fc + 1 - 0- 

But making use of the Legendre duplication formula for ganuna functions, 

(18.6.46) r(A: + 1 - 0 “ ^ ~ 
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we find upon substituting this in (18.6.45) and then substituting the result¬ 
ing expression for in (18.6.40) that 


(18.6.47) 


C = 


Jjc 




Putting this value of C in (18.6.36), we finally obtain the distribution of the 
/i,.. . ,/fc. But it will be recalled that/i,... ,/fc are identically equal to the 
random variables /i,. . ., 4, respectively. Therefore the distribution 
4,.. ., 4 is given by (18.6.36) simply by relabeling/j, ...,/* by 4» •••, 4 
which yields (18.6.21), thus completing the argument for 18.6.2. 

The distribution of 4,..., 4 given by (18.6.21) is sometimes called the 
distribution for the null case. The general case would be that in which the 
covariance matrix in the normal distribution from which the sample 
comes would be different from the matrix in (18.6.20). The distri¬ 
bution of the eigenvalues in this more general case has been found by 
James (1960). It is a considerably more complicated distribution than that 
given by (18.6.21). 


18.7 DISCRIMINANT ANALYSIS 


(a) Case of Two Samples 

The problem that we shall consider in this section is as follows: Suppose 
we have two samples from /c-dimensional distributions. These can be 
represented geometrically as two sample clusters in Euclidean fc-space. 
We want to project these two sample clusters orthogonally onto a line 
so that the variation between the two projected samples is as large as 
possible, relative to the variation within the two projected samples. 
The problem is to find the direction of projection which will accomplish 
this. In other words, what we want to do is to project the two sample 
clusters back into one dimension so that the two sample clusters after 
projection are as far apart as possible relative to the within-sample 
variability. In practical situations if we can find a direction of projecting 
two fc-dimensional sample clusters into one dimension so that the two 
projected samples are reasonably well separated, whereas they would not 
be thus separated by projecting into the space of one or some small 
number of the /^-components, we would have a way of discriminating 
between samples from two distributionJi by a suitable linear combination 
of the fc-components of the vector on which the measurements are made in 
the two samples. This problem was originally considered by Fisher (1938) 
and the method of statistical analysis which was developed from the solu¬ 
tion of the problem is called discriminant analysis. We shall now consider 
this problem more precisely. 
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Suppose , 4?y» fy = 1,..., «y), y « 1, 2 where n^>k,n 2 >k 

are two samples. Let {x^^\ ..., y = 1,2, be the vectors of means 
and 114/^11, y = 1,2, the internal scatter matrices of the two samples 
respectively, which are assumed to be nonsingular with probability 1. 
Let (^ 1 ,..., be the vector of sample means, and J the internal 
scatter matrix of the grand sample composed of the two samples pooled 
together and \\uYj\\ == \\uip + ulf^W the within-samples catter matrix for 
the two samples. The matrix ||m^|| =s ||w,^ — is the between-sample 
scatter matrix. For an arbitrary vector let 


(18.7.1) 


k 



Then (4^ .. ., 4^) and ..., except for scaling, are one¬ 
dimensional samples obtained, respectively, by projecting the original 
fc-dimensional samples onto a line whose direction cosines in the original 
^-dimensional space are proportional to (cj,..., c^. Let and be 
the means of the two samples of 2 *s and z the mean of the pooled samples. 
Let 


2 

(18.7.2) Sh. = I 

y = My = i 


Sn = 2 ny(2^^> - zf 


y = i 


It will be noted that if S is the scatter of the grand sample obtained by 
pooling the two samples of 2 ’s, then S = 5'^^ + Syv is the within- 
sample component and the between-sample component of S, Now 
the basic problem is to determine (q ,.,,, Cj,) so as to maximize 
(that is, to minimize SJ(S^ -f SJ) for a fixed value of S^. 

The basic results concerning this problem may be stated as follows: 


18.7.1 Let \\u^ II and \\ufj\\ be within-sample and between-sample scatters of 
two samples from a k-dimensional distribution where the sample sizes 
both exceed k and where ||m^|| is positive definite with probability 1. 
Let and be defined as in (18.7.2). The value of (cj,. . ., c^), 
say (Cj,.. ., 4 ), which minimizes the ratio 


(18.7.3) 


Q 


Sw 

Sw “F Sb 


so that 5^ has a fixed value C^Oy is the solution of the equation 
(18.7.4) 2 (Mu - =^0, i = 1,..., fc, 

where 4 is the nonzero root of the characteristic equation 


(18.7.5) 
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given by 

(18.7.6) 2 - 20)X*<« - x**)) - m*. 

«i + njw-i 

where 

(18.7.7) i - ll“MrS 

aR</ 6, u given by (18.4.14). Furthermore, the minimum value of Q 
/or5^*C#0&l/(l + /0. 

To prove 18.7.1 observe that 

s, = 2 n,[2 -*<)] = 2 

riRTR'l ^ 

(loJ.ii) 2 n r k "la 3 k 

SlK = 2 X ^ I “ 2 

y=l S,y=lLi=i ’ J <,>-1 

To minimize S„I(S„ + S,) subject to the condition that =s C # 0 
is equivalent to (using a Lagrange multiplier/) maximizing 5, + (C — S'^)/, 
which we denote by (p, say, with respect to (cj,..., c*) and /. The maxi* 
mum of q> is given by the solution of the equations 


(18.7.9) 


^ = 0, i = l. k 

dCf 

i=»- 


which may be written as 

(18.7.10) 2 («« - = 0, i = 1. k. 

y-i 

To have a solution (c{,... ,cl^ other than (0,...,0) for (18.7.10) it Is 
necessary that 

(18.7.11) |«5 - /«51 = 0. 

Recalling from Section 18.4(6) that 

B bfbf 

where, as we have seen in (18.4.14), 


(18.7.12) 


(^) _ ^«)), 

iii + n* 
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we find that (18.7.11) is equivalent to 

which can be written as 

(18.7.13) I • [l - i «lr' btb] * 0 

L J 

where ||u^|| = ||«5II“^- Th® nonzero root 4 of (18.7.11), is therefore 
given by 

(18.7.14) 4 = i 

t,i=i 

Therefore, the solution (c[ .®t) (18.7.10) satisfies 

(18.7.15) I («' - 4u,>; = 0, i = 1. k. 

If (18.7.15) is multiplied by c,' and summed over i, recalling that 
Sir = 2 

i.i=i 

we find 

(18.7.16) i uf,c;c; - 4c = 0, 
for which, as stated in (18.7.6), 

where b< is given by (18.7.12). 

Substituting cl for Cf in (18.7.8) and the resulting expressions for S, 
and S^ into the ratio Q, and making use of (18.7.16), we note that the 
minimum value of Q is 1/(1 + l^, thus completing the argument for 

18.7.1. 

The problem of discriminant analysis for the case of two small samples 
from normal distributions in a large number of dimensions, that is, 
where ni and Us are l^s than k, has been treated by Dempster (1938). 

(b) Case of Several Samples 

In the problem of two samples, we have shown how to project two 
ik-dimensional samples onto a line so that the means of the two one- 
dimensional samples of points along this line are as far apart as possible 
rdativeto the within-sample scatter of these two one-dimensional samples. 
In the case of three A;-dimensional samples we woidd want to project Aese 
three samples onto, a two-dimensional space (ordinary plane) so that the 
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scatter of the pooled two<dimensional samples is as large as possible 
relative to the within-sample scatter of the three projected samples. In 
general, if we have .r + 1 A;-dimensional samples s + 1 < k,v/t want to 
project these samples onto an j-dimensional Buclidean space so that the 
scatter of the + 1 pooled .r-dimensional samples, resulting from the 
projection, is as large as possible relative to the within-sample scatter of 
the 5 -b 1 j-dimensional samples. 

More precisely, suppose Sy I,... ,ny\y=^ U .. .,s + U 

are + 1 fe-dimensional samples, ^ + 1 < fc. Let \\ulj^\\ be the scatter 
matrices of these samples about their respective means (x^^\ ..., 
y = l,...,.y + 1, and let be the internal scatter matrix of the pooled 
samples. Let WufjW = + • * * + be the within-sample scatter 

matrix, assumed to be nonsingular with probability 1, and let 
llwyll = W^ij “ “^11 be the between-sample scatter matrix of the samples. 
It is to be noted that if ||u^|| is nonsingular with probability 1 so is 
Let be defined as in (18.7.1) with y = 1,..., j -f 1, where (q,,,..., 

1,...,5, are linearly independent vectors. For the yth sample, 
y = 1,..., j -h 1, let the coordinates of the projected ^-dimensional 
sample be 

(18.7.17) = i p = 1,..., s, f = 1,. . ., n,. 

Let be the internal scatter matrix of the yth sample of z\ 

y = l,...,j + 1, and ||Mj^ 1| the internal scatter matrix of the -h 1 

pooled samples of 2 ’s. The within-sample scatter matrix H-h 

of the j -h 1 samples of z’s will be denoted by The between-sample 
scatter matrix l|w^ — u^g\\ of the z’s will be denoted by ||w^l|. Now, our 
problemis to find j linearly independent vectors (cjp,..., = 1, ... ,5 

which will maximize the scatter \Uj^\ relative to the scatter This is 
equivalent to finding a direction of projection of the j -f 1 samples onto an 
i'-dimensional space so that the ratio of within-sample scatter to the 
scatter of the pooled samples in is the same as the ratio of within-sample 
scatter to scatter of pooled samples in the original space Rj,. In other 
words, we want to find vectors (c^^, ..., c^^p), p = 1,..., j, so that 


(18.7.18) 


i“«i 


To find the required vectors we proceed as we did in the two-sample 
case, and for an arbitrary vector (c^,..., c^, we define S, and S,^ as in 
(18.7.8). We then find the values of this vector for which S, is stationary, 
subject to the condition Uiat has a fixed value, C # 0, say. The values 
of this vector are given by the equations (18.7.9) where q> — Sg + 
(C — S^)l. The resulting equations are given by (18.7.10), where, of course. 
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||u§|| and ||»p are the between-sample and within>sample scatter matrices 
for s + 1 samples, rather than for two samples. The condition under which 
(18.7.10) can be solved for the case of 5 + 1 samples is, of course, the 
(s + l)*sample version of (18.7.11), that is, we must have 

(18.7.19) |«5 - /«5| = 0. 

Now if «! + ••• + — (^ + 1) > fc and if j + 1 < A:, it can be 

verified that ||u§|| is of rank s and Hu^ H is of rank k (with probability 1). 
Thus, llu^ll is positive definite, and ||i^|| is positive semidefinite. If we 
denote these two matrices by and u', we know that it follows from the 
theory of such matrices (see Birkhoif and McLane (1953), for instance) that 
there exists a real nonsingular nuitrix e, and s real numbers, 

all different with probability 1, namely 0 </,<•••< 4 < +oo, such that 

(18.7.20) e'u^'e = /, eVc = L 

where Lis a.k x k diagonal matrix whose diagonal elements are 4 ,..., l„ 
0,..., 0. Therefore, if we multiply the matrix (u‘ — lu'^) on the left by e' 
and on the right by e' we obtain 

(18.7.21) {e'u‘e - le’u^'e). 

Taking determinants of this matrix, we find 

(18.7.22) \e'u’e - le’u''e\ = |L - //| = /*-‘(4 - /) • • • (4 - /). 

But since 

(18.7.23) le'«'e - leu'^e\ = |c'| • |m' - lti'\ • \e\ 

and since \e\ = \e'\ ^ 0, it follows that the values of / for which 
|«* — lu''\ vanishes are identical with those for which \e'u’e — le'u'^e\ 
vanishes. But we see from (18.7.22) that the nonzero roots (eigenvalues) 
of the latter are 4» • • •»4’ Now let (cj,,..., c*,),/; = 1, ..., s,be the 
solutions (eigenvectors) of the (s + l)*sample version of (18.7.10); that is, 
(Ci„ ..., Ct„) satisfies the following conditions, 

2 («« - “0. ' “ 1. 

h 

2 

It should be noted that the equation in (18.7.24a) is merely C — S^^O, 

where is evaluated for (c^.c^) a (c^,,..., 

We shall show that (c^,,..., 1.. are foe required vectors. 


(18.7.24) 

(18.7.24a) 
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If we multiply the equations (18.7.24) by and sum over i, making use of 
(18.7.24a), we obtain 


(18.7.25) 


2 

i,i = l 


If we now multiply (18.7.24) by q ^ p, and sum over i, we obtain 


(18.7.26) 
Similarly, 

(18.7.27) 


k k 

1 .^ = 1 i.i = l 

k k 

i,i = l ij = l 


Since 4 9^ ^p, it follows by taking the difference between (18.7.26) and 
(18.7.27) that 


(18.7.28) 


2 ^ij^ip^iQ 
t,i = l 


It then follows from either (18.7.26) or (18.7.27) that 


(18.7.29) 


2 ^H^jp^iQ 
i,i = l 


Now it can be verified that 


(18.7.30) 


^PQ 2 ^hj^ip^jQ ““ ^pq^ 
1,^ = 1 


(18.7.31) = C + C = 2 (wo- + 

t,2 = l 

= + lp)y 

where dj^ =: I, if p = q and 0 if p ^q. Therefore, we have 


(18.7.32) 


\Upq\ (l+/l)* ••(! + /,) 


Now consider the ratio of scatters \ufj\l\Uij\ in the original A:-dimensional 
space. It follows from (18.7.20) that since 

le'u^el = k'l • • kl = 1 

we have \u^\ = l/ky|^. Since it follows from (18.7.20) 

similarly that |i/„| = (1 + 4) • • • (1 + /,)/k„l^. Therefore, 

g - a ^Vo - 

and thus the two ratios of scatters on the left-hand sides of (18.7.32) 
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and (18.7.33) are equal, the common value being [1/(1 + /i) • • • (1 + /,)]. 
We may summarize as follows: 


18.7.2 


Let (4Vy, • • •. fy = 1,..., Wy), y = 1,..., J + 1,«! + • • • 
+ — (5 + 1) > fc, s + I < k, be ^ + 1 independent k- 

dimensional samples whose ratio of within-sample scatter to 
pooled-sample internal scatter is |w^|/|Wij| where \u^\ 0 with 

probability 1. Let these sample points be linearly mapped (projected) 
into an s-dimensional space by (18.7.17) and let \Upq\l\Upq\ be the 
ratio of within-sample scatter to pooled-sample internal scatter of 
the mapped sample points. The eigenvectors (cjp,..., 
p — \,..., s <k all subject to condition (18.7.24^),/or which the 
two scatter ratios |w^|/Wy| and are equal are the solutions 

of (18.7.24) where the eigenvalues 0<4<**'</i<+oo 
are the s nonzero roots of (18.7.19). The common value of these 
two ratios is 1/[(1 + /i) * * * (1 4* 4)]. The eigenvalues Ij,, p = 1, 
, . . ,s are given by (18.7.25), and they are all different with 
probability 1. Any two of the eigenvectors (q^,. . ., o^p) and 
(ci ^,. . ., Cj^^Xp ^ q, satisfy conditions (18.7.28) and (18.7.29). 


The main significance of this theorem is that if the s eigenvectors 
(cip,..., Cjtp), /7 = 1,..., 5, are used in (18.7.17) to map linearly the 
/:-dimensional sample points into an 5-dimensional space, the scatter ratio 
\u^\l\Upq\ of the mapped points in Rg is exactly the same as the scatter 
ratio \ufj\l\Uij\ in the original fc-dimensional space This is equivalent 
to projecting the /^-dimensional sample points into an 5-dimensional 
Euclidean space so that the pooled-sample scatter | Mp^| is as large as possible 
relative to the within-sample scatter If we should desire to project the 
5+1 samples onto a one-dimensional space so as to obtain as large a pooled- 
sample scatter of these one-dimensional points as possible relative to the 
within-sample scatter of the points, we would use in (18.7.17) the eigen¬ 
vector (cii ,..., Cjci) corresponding to the largest eigenvalue 4 . The ratio 
of within-sample scatter to pooled sample scatter for sample points as 
mapped onto the one-dimensional line is 1/(1 + 4). Similarly, if we want 
to project the sample points into a /-dimensional space, / < 5 , we use the 
eigenvectors (q^,..., c^p), /? = 1,..., / corresponding to the eigenvalues 
4 ,..., If The ratio of within-sample scatter to pooled-sample scatter for 
the sample points in this /-dimensional space is 1/[(1 + 4) *'' (1 + 4)]' 

It should be noted that if we assume that the 5 + 1 samples came from 
/:-dimensional normal distributions all having the same covariance 
matrix and if we wish to test the hypothesis that those normal 
distributions also have identical vectors of means, then the scatter ratio 
\t^\l\u^\ is equivalent to the Neyman-Pearson likelihood ratio criterion for 
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making this test. Furthermore, if the hypothesis is true this ratio has the 
same distribution as R, in (18.5.3) with /n = H-h — (j + 1). 

18.8 DISTRIBUTION OF EIGENVALUES IN 
DISCRIMINANT ANALYSIS 

In the discriminant analysis problem of Section 18.7 the eigenvalues 
/,<•••</!, and the corresponding eigenvectors ..., /? = 1,..., j: 

play key roles. The extent to which k-dimensional sample clusters are 
separated in k*space depends on the magnitudes of the eigenvalues—the 
larger the eigenvalues the greater the separation. If the samples are 
random samples from identical k-dimensional normal distributions, 
the eigenvalues have a distribution th^t we can determine without great 
difficulty. In the particular case of two samples from identical k-dimen- 
sional normal distributions the ratio of within-sample scatter to total 
scatter has the value 1/(1 + /J where 4 is the only eigenvalue involved. 
Furthermore, it is seen from Section 18.4(fc) that 1/(1 + Ij) is exactly the 
same as R[ defined in (18.4.15), and, of course, has the same distribution 
as jRp It will be further noted that (rii + Wg — 2)li is Hotelling’s for 
the two-sample problem. 

In this section we shall consider the sampling distribution of the eigen¬ 
values 4,..., /g in the case where the ^ -1- 1 samples all come from identical 
k-dimensional normal distributions. There are two important cases to be 
considered: (i) the case where s + I > k, that is, where the number of 
eigenvalues is equal to k, and (ii) the case where s + I <, k, that is, where 
the number of eigenvalues is s which is less than k. 

(a) The Case of k Eigenvalues 

In deriving the basic distribution theory of these eigenvalues, it is 
convenient to establish first the following general result: 

18.8.1 Suppose {Vi^ and {v\j} are two independent systems of random varU 
ables having Wishart distributions W(ky n, lOii^)andW{ky n\ ||cr^^||), 
respectively, where n, n' > kj. Then the roots 0 < gjb < • • • < 

< + cx) o/ the equation 


(18.8.1) \v'„ - (Vf, + vi,)g\ = 0 
have a distribution with probability element 

(18.8.2) 

■] *(«-»-« 


r * ii(n-fc-i)r fc 1 
K[j[i(i-ft)J [n^J 


l(n'-fc-l) h 

TI iSi- gi)dgi - dg^ 

i>i^l 
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in the region for which 0 < < gj^j < • • • < < 1 and 0 

otherwise, where 


K 


. » 


r| 

+ b' + 1 - 




+ 1 - A 


^n' + 1 - i] 

|r 

/fc + 1 - A 

^ 2 / 

{ 2 ) 

1 2 / 


It can be shown by argument similar to that following 18.6.2 that the 
proof of 18.8.1 is equivalent to proving that the roots of 


(18.8.1U) 


Wa - ("« + A\ = 0 


have probability element (18.8.2) where and are independent 
sets of random variables having Wishart distributions fV(k,n, and 
fV(k, n\ ||<5y||). So we proceed as follows. 

Since \\vfj + and \\v'^*\\ are both positive definite, (with probability 
one), there exists a real nonsingular matrix = e, say, and a diagonal 
matrix m with diagonal elements all different with probability 1, namely 
1 > Wj > • • • > /Wjfe > 0, so that (denoting \\vfj + vl^\\ by v* + v'* and 
l|i^;;il by v'^) 


(18.8.3) 


V* + v'* = e'e 
!?'♦ = e'/ne. 


But equations (18.8.3) are equivalent to 


(18.8.3a) 


i;'* = e'me 


V* = eXl — m)e. 

It follows from (18.8.3) that the equation (18.8.1a) can be written as 

(18.8.4) \e'me - ge'e\ = 0 

or 


(18.8.4a) \e'\ • \m - gl] • \e\ = 0. 

Hence the roots gi,..., g* are, respectively, identically equal to Wj,..., /Wj^. 
Now the probability element of {i;^} and is 

(18.8.5) 11 dvtfdv',; 

where i4 is a normalizing constant. 
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If we apply the transformation (18.8.3a) to (18.8.5), we find the Jacobian 
J to be of the following form: 

U} columns k columns 


( 18 . 8 . 6 ) J = 


If we add the bottom lk{k + 1) rows to the top ik{k + 1) rows, respec¬ 
tively, we obtain a form for 7 in which the elements in the upper right-hand 
block are of form divfj -f i^J/)/3(w,') which, in view of (18.8.3), are allO. 
The only part of the resulting form of J which involves the /w’s are the 
elements in the lower left-hand block, and are all homogeneous linear 
forms in . .., with coefficients depending only on the Hence, 
if the resulting form of J is expanded by Laplace’s method (see Bocher 
(1907)) with respect to the top ik(k -h 1) rows, it is evident that every 
complementary minor picked from the bottom ik(k + 1) rows will have 
^kik — 1) columns selected from the lower left-hand block, and hence 
every term in the Laplace expression will be a homogeneous polynomial 
in Wi,. . ., /Wfc of degree lk{k — 1). If i > y, then the trans¬ 

formation (18.8.3a) is indeterminate and 7=0, and hence (m^ — /w,)®« 
is a factor of 7, where a,, is a positive integer > 1. Therefore, 7 is of form 

k 

G J7 (ntj — where G depends only on the e^y But the fact that 7 

i >j == 1 

is a polynomial of degree ik(k — 1) implies that each a,, = 1, hence 

k 

(18.8.7) 7 = C n 'w*)- 

i >j — I 

It is seen from (18.8.3) and (18.8.3a) that 

Wi*\ = ki/ n 

i = l 

(18.8.8) \Vif\ = XT 

t=i 

2 (^ii + ~ 

t = l 

Therefore, applying the results of (18.8.7) and (18.8.8) to (18.8.5) we see 
that the and {wJ are independent sets of random variables and that 


3(u,*) 

3(%) 




i d(v;;) 




ik(k 4- 1) rows 

J/c(k -f 1) rows. 
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the probability element of the {mj is 
(18.8.9) 

^|[n (1 - "»<)J [n ^ IT ^ (r»i - '”#)] dmi -- dm^ 


over the region for which 0 < m* < • • • < OTi < 1 where /T is a 
constant to be determined. 

Let us denote the integral of the function in { } over by 
— 1 )s Wi' — k — 1)). Then we have 


(18.8.10) 


1 In 

- = n[- 


- fe - 1 n' -k- 


4 


Since {i?^} and have independent Wishart distributions W{k, n, 
and W{k, n\ ||d<yll) we may write down immediately from (18.2.33) that 


( 18 . 8 . 11 ) = n 


+ 1 — t' j p + 1 — 

But it follows from (18.8.8) that 

(18.8.12) = <^([n(1 - »«*]”') 


in — k — 

1- n' 

— /c — 1 

-r\ 

2 

-hr, — 

2 

1 

in — 

fc - 1 n' 

-fc-l\ 


9’*!- 

2 ’ 

2 / 



Putting r = —Kn — — 1) in (18.8.11) and (18.8.12), we obtain 

(18.8.13) 




(n-k-l n' 

- it- 1 


Jn' + 1 - i\ 

1 2 ’ 

2 

f \ 2 / 

\ 2 / 


Our problem of finding K is now reduced to determining Af), where 
M i(n + n' — 2k — 2). We note that 

(18.8.14) ^0, Af) “ f (n m,) XI C™/ - dmi • • • dm„. 

/ 0^-1 
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Let mi = t, m 2 = tm 2 ,... ,m^= tm'^. Then we find 
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(18.8.15) 


9>*(0, M) = 


1 


From (18.8.13) we have 
(18.8.16) 




^,_i(l.M). 


y^.(0, M) _ i^j- 

M - 1) M 2- 


O' 


By solving the system of two-parameter difference equations defined by 
(18.8.15) and (18.8.16) we obtain 


(18.8.17) y,(o, ” + ^ 


2 


=v -n 

i = l 


+ n' + 1 — 


Substituting this in (18.8.11) and solving for l/^^^CK'^ — fc — 1), 
^(/i' 1)) which we recall from (18.8.10) is the value of K, we obtain 

the value of K given in 18.8.1. 

In the discriminant analysis problem we are interested primarily in the 
eigenvalues of the equation 

(18.8.18) 14 - /4l = 0 


rather than those of (18.8.1). 

But the roots of (18.8.18) are related to those of (18.8.1) as follows: 


(18.8.19) 


^1 = 


k 

1 + /i 


• • > Sk “ 


k 

1 + 4 


Thus, if we apply the transformation (18.8.19) to (18.8.2) we obtain the 
following result. 


18.8.2 If {Vij} and {4} are independent sets of random variables having 
Wishart distributions w{k,n,\\Gij\\ and I^(^,||cr,.^||) then the 
eigenvectors 0 < lic<'^'<k afWn — = Ohave the probability 

element 


(18.8.20) 

[ k “|-I(n+n')r * n 

n(i+/*)J [ny 


-I(n+n')r * k 

n (k-ti)dk-dk 
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in the region for which 0 < 4 < 4-1 < • • • < 4 + 00 and 

0 otherwise^ and where K is given in 18.8.1. 

Returning now to the discriminant analysis problem for + 1 samples 

where j- + 1 > A: we have the following corollary of 18.8.2. 

18.8.2a If llw^ll and |1m,^|| are the between-sample and within-sample 
scatter matrices of s + \ samples of sizes ..., respec¬ 
tively, where j + 1 > A: and ni + • • • + — (5 + 1) > A:, 

then the eigenvalues 0 < ljf.< •••< l^ of 

(18.8.21) \uf, - /Cl = 0 
have probability element 

(18.8.22) XoLna + i*) n/. 

Li-l J Lt-1 J <>>*! 


(18.8.23) 


in the region for which 0 < /^ < < • • • < /^ < + 00 and 0 

otherwise, and where 

K = TT**' TT_-_ _ _ _ _ 


where /i = + • • • + n^^i — (j + 1). 


This proof follows at once from 18.8.2 after noting that under the 
assumptions stated, {u^} and {u^} are independent sets of random 
variables having Wishart distributions W{k,s, ||(y„||) and W{k,n, Ilcr.J) 
where n = + • • • + n^^^ — (^ + 1). 


(b) The Case of s Eigenvalues {s <k) 

Now, in 18.8.1 suppose n' < k; then the elements of will have a 
degenerate Wishart distribution. In fact, will have the same distri¬ 
bution as 12 *<{*«! where ..., z»{, f = 1 .n') is a sample of 

size n' from the Ir-dimensional normal distribution iV({0}, ||o<j||). Then 
(18.8.1) has n' eigenvalues 0 < g„. < • • ‘ < g^, and it is evident that the 
probability element of gi,... ,g„< has the same form as (18.8.2) with 
k, n, n' replaced by n', n + n' — k, k respectively. 

Returning to the discriminant analysis problem for s -b 1 samples, 
where i 1 < A:, it will be seen that, in this case, (18.8.21) has eigen¬ 
values 0 </,<•••< /j, s <k. Therefore, the probability element of 
these eigenvalues is given by (18.8.22) with k, n, s replaced by .v, n -f- ^ — A:, 
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fc, respectively, remembering, of course, that« = H-— (5 + 1). 

The results expressed in 18.8.1 and 18.8.2 are companion results to 
those given in 18.6.2 and were obtained by the various authors referred to 
in the early part of Section 18.6(b). 

18.9 CANONICAL CORRELATION 
(a) Determination of Canonical Correlation Coefficients 

In this section we shall consider the following problem: Suppose we have 
a sample from a fc-dimensional distribution. We wish to find a linear 
function of the first .y-components and a linear function of the last t- 
components (5 -f / = k), so that these two linear functions have the highest 
possible correlation coefficient. In a practical situation the two linear 
functions may be regarded as indices constructed from the first s and last t 
variables, respectively, and one of these indices is to be used for predicting 
or estimating the other. Then it is natural to inquire how these two linear 
functions should be constructed so that the ordinary correlation coefficient 
between them is as large as possible. 

More precisely, suppose / = 1,.. ., f = 1,. .., «), n> ky is 
a sample of size n from a /c-dimensional distribution, and let ||w, J be the 
internal scatter matrix of this sample. Let {Xj,^;p = 1,..., 5 ; f = 1,.., n) 
be the first s components of the sample, and let ||Wp^||, /?, 9 = 1 ,..., be 
its internal scatter matrix. Similarly let = l,...,/:;f = 

1,...,«) be the last t components (j -h / = A: and t > s>0) of the sample 
with internal scatter matrix \\u^,J\ ;VyW = s-{- 1,. .. , We note that 


(18.9.1) 


^PQ 

^pw 

^vq ! 

^vw 


We assume that ||m,,|| is nonsingular with probability 1. Now consider 
real vectors (q^; /? = 1, . . ., 5) and (cg^; v = s , k) and let 

8 

(18.9.2) it 

v = s + l 


Wll 

Ui2 

W 21 

^2 


f = 1,..., n. Let 
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be the scatter matrix of the sample (z^;, f U • • • > i>) about its own 
mean. Our problem is to determine the two vectors (cx,;» 1,..., f) 

and (c 2 «; v — 5 + 1. k) subject to some normalizing condition, which 

we may take as iln 1 without loss of generality, so that the corre¬ 

lation coefficient 

(18.9.3) R = 

V ^11^2 

is a maximum. 

The solution of this problem is due to Hotelling (1935) and can be 
summarized in the following statement: 

18.9.1 The eigenvectors = 1,..., j) and t; = j + 1,..., ik) 

which maximize R subject to the conditions = tJf 22 = 1 are the 
solutions of the equations 

2 ^PQ^lQ 2 ^Pie^2w “0, P = 1, , . , , S 
«*! 

(18.9.4) . _ X 

2 “tKiCi, - = 0, t) = s -f- 1,. .., fc 

<Z*1 fO = « + l 

where 4 is the largest eigenvalue of 

lUpq • • • ““Mjplj, 

(18.9.5) . =0. 

^VQ ’ * * ^VID 

Furthermore, the maximum value of R is V^. 

More generally, (18.9.5) has s eigenvalues 4,. .., 4 on the 
interval (0, 1). We assume that the sample i = 1,..., A:; | = 
1,..., n) comes from a distribution such that these roots are all 
different with probability 1, in which case they may be labeled as 
follows: 0 < 4 < • • • < 4 < 1* The eigenvectors {c[^JfP =» 
1,..., j) and (c^l\ V := s + 1,k) are solutions of equations 
(18.9.4) with 4 replaced by lg,g ^ , ,s. The value of R 

when computed from these eigenvectors is V Ig. The value of R 
computed by using the eigenvectors (4^^; p = 1,..., j) and 
(cSf>; t) » j -H,..., A:), g # g', to zero. 

The correlation coefficients ..., /{<*’, which have the values 
V^,..., V7j^ respectively, are called the canonical correlation coefficients 
between the first a components (a^,..., x^; ( ^ 1,n) and the last 
t components (»H-it> • • • i ‘’’m* f ■■ 1> * • • > n) of the original sample 
(Xff ,..., { mi 1,... ,n). The canonical correlation coefficient of 

greatest practical interest is. of course, the largest one, namely 
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To prove 18.9.1 we proceed as follows. It can be verified that 

Mil ~ 2 MpjCipCjj 

(18.9.6) W22 = 2 

v,w 

^12 ~ 2 ^vv)^lp^2w 

VtW 

where p, q range over 1,..., 5 and v, w range over s + 1,..., fc. Now 
consider the vectors p = I,,.,, s) and (cg^; i; = 5 + 1,..., Jt) 
which will make R stationary subject to the conditions Un = U 22 = !• 
Using Lagrange multipliers k and p, the same vectors will also make 

(18.9.7) 9^ = Wi2 + (1 — Mii)A + (1 — U2^p 


stationary. The vectors which yield extrema of q> are given by solutions of 
the equations 


(18.9.8) 


i£_o. 


|2L = 0, 

dC2„ 


P = 1.S 

V = s + . ,k. 


that is, the equations 

■“^2 “i“ ^ ^pw^2w “ P — 1, . . . , S 

Q W 

(18.9.9) 2 = 0. = s + 1.fc. 

q w 


Multiplying the first equation in (18.9.9) by Cjp and summing over p and 
the second by Cj* and summing over v it is evident that X = (i. Thus, 
replacing by A in (18.9.9) we must solve for the required vectors. But to 
obtain such solutions we must, of course, have 


(18.9.10) 


—Aup, 

^pw 




We can factor the determinant so that (18.9.10) reads as follows: 


(18.9.11) 


(_1)»A*'-** 


lu„ 

^PW 

1 

R 

^VVJ 


= 0 


where / = A*. It is evident that the determinant in (18.9.11), that Is, the 
determinant (18.9.5), is a polynomial in / of degree s, and it can be shown 
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that its roots are real and positive. Let its roots be 4,, I,. The eigen¬ 
vectors p 3m 1. j)and» = 5 -I- \,... ,k) corresponding to 

4 are the solution of 

S "b ^ ^pw^ 2 u> “ p = 1,. .., s 

Q to 

(18.9.12) 2 - 74 2 = 0, JJ = s -I- 1. k. 

Q to 

If we insert these eigenvectors in (18.9.12), multiply the first equation by 
and sum over p, noting that 2 ^pg^ip^if = 1, we obtain 

PA 

(18.9.13) 2“*»c<j:>c<J> = 74. 

v,w 

But due to the conditions = W 22 = fhe left-hand side of (18.9.13) is 
the value of the correlation coefficient R between and in (18.9.2) 
using the two eigenvectors given by (18.9.12). Hence, < 1, ^ = I,..., 5 
and the roots /i,..., /„ which are assumed to be all different with prob¬ 
ability 1, may be ordered 4 < • • * < /i, He on (0,1). Since 4 is the 
largest root, then the eigenvectors which correspond to 4 are the solutions 
of (18.9.4). It can be shown that for any sample size > A:, 4 = 1 with 
probability 1 if and only if the first s and last / {s + t ^ k) random 
variables in the fc-dimensional distribution from which the sample is 
drawn are linearly dependent. 

To establish the fact that the correlation coefficient between Zy and Zg 
is zero for any two different eigenvectors /? = 1,..., j-) and (4^^; 
V s + , k), g ^ g\ y/c consider the equations (18.9.12) with the 

eigenvectors /? = 1,..., j), (4S^; v = s + 1, .. ., A:) inserted. Multi¬ 
ply the first and second equations by and 45 ^ respectively, and sum 
over p and v. Then consider the corresponding equations and operations 
with g and g' interchanged. Since 4 Ig' the reader will find by suitably 
combining equations, that 2 ^pw^vp^ 2 iv = ^Hat is, the correlation 

p,w 

coefficient R obtained by using (4p^ p = 1,. . ., ^) and (4^ ^; t;==5’ + 

1.. .., A:) vanishes if g g'. He will similarly find that for g ^ g' 

p,a v,w 

(b) Sampling Theory of Canonical Correlation Coefficients 

The problem of determining the sampling distribution of the squared 
canonical correlation coefficients 4 ,..., 4 under general conditions is 
very complicated. However, under certain conditions the distribution of 

4 .. ... 4 is ^ special case of the distribution given in (18.8.2). More 
precisely we have the following result: 
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18.9^ Let(x,f',p “= 1,... f « 1 ,... ,n)and(x,f; v am s + I,..., k; 

f “ 1.n) be independent sanities, the first being from the 

s-dimensional normal distribution ||<t,^||) and the second 

from an arbitrary t-dimensionaldistribution s < t,s + t k such 
that its internal scatter matrix ||m^|| is nonsingidar with probability 1, 
Then the roots 0 </,<•••< ^ < 1 o/(18.9.5) have a distribution 
with probability element 


(18.9.14) 


\uh 

Lp-i J Li»-i J 


n (l,-l,)dh-- dl, 

p>a—1 


in the region for which 0 4c /, < /^i < • • • < /i < 1 and 0 
otherwise, and where 


(18.9.15) K*=ir**n 


r 

tr) 


r(” -' - p' 

\ 2 ) 

(r(i 

t + 1 — f 
2 

■)r| 

''s -b 1 — p\ 
1 2 / 


In proving 18.9.2 it is sufficient to consider the case where (a:^; d = ^ + 1, 
... f = 1,... ,n) are arbitrary numbers and are linearly independent 

(that is, is nonsingular). We will show that in this case li ./, 

has distribution (18.9.14). Since (18.9.14) holds for any linearly inde¬ 
pendent set of numbers, it clearly holds if {x,^} is a sample from an 
arbitrary r-dimensional distribution provided ||u,„|| is nonsingular with 
probability 1. 

To proceed with the proof of 18.9.2 let us first multiply each side of the 
equation in (18.9.5) on the left by the determinant 



I Up."”*’ 


V 

0 



Using ordinary matrix multiplication, where 1|«*“’|| = I1 m»w 11“^ and 
is the Kronecker d, we obtain 


(18.9.16) 


/««, - 2 “’"WppWicd 

tipyj "f* 2 ^W^W'W 

v.tr 



2 

V 

w' 


0 


where p,q ranges over 1,..., s and v, w, w' ranges over ^ -f l,...,k. 
Since 2 evident that each element of the upper 

I** 
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right-hand block of (18.9.16) is 0. Therefore the roots of (18.9.5) are 
identical with those of 

(18.9.17) = 

\v,w I 

Now let us denote (as^j — ®,) by and (a;^ — 5,) by y^. Then 

(18.9.18) Mw, = 2 y,fy«f = Wot + 
where 

i;,w=s + l 

bpp,= 2 

«** + ! 

Substituting the expression for bj^ into the expression for we find that 
“m = 2 “'"“jwWiro- Therefore (18.9.17) can be written as 

VpW 

(18.9.19) lu<« - (u^V + «‘«)/| = 0. 

It can be verified that under the assumptions of 18.9.2 the two sets of 
random variables and {uf^} are independent sets of random variables 
having Wishart distributions Wis^n — / — 1, ||(Tp^,||) and W{s, t, ||(rj,^||). 
It follows from 18.8.1, therefore, that the roots of (18.9.18) have the distri¬ 
bution given in (18.8.2) with k replaced by s, n' replaced by r, and n 
replaced by /i — / — 1. This resulting distribution is (18.9.14), thus 
completing the proof of 18.9.2. 

PROBLEMS 

18.1 If (a?!^,..., f =s 1,..., n) is a sample of size /i(/i > k) from the 
normal distribution iV(|)uJ; ||a,-^||), show that the maximum likelihood estimato. 
of ifii, A^ib) is («!,..., x„) and of l|a,.^l| is |li/<^/n|| where («i, ...,«*) is the 
vector of sample means, and \\Uij\\ is the internal scatter matrix of the sample. 

18.2 If the elements of the symmetric matrix /,y » 1,..., A: < m 
are random variables having the Wishart distribution fV(k, m, j|(r^^||) show by 
the use of characteristic functions that the elements of llap^ll,/?, ^ » 1,..., 5 < A; 
have the Wishart distribution W(s, m, ||<yj,gil). 

18.3 Distribution of correlation coefficients in sample from k-dimensional 
normal distribution having independent components. If a sample (o;^,..., 

f ■■ 1.If), If > At, comes from the normal distribution ||<y<Aill) 
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and if is the internal scatter matrix of the sample," show that the p.d.f. of 
the elements of the correlation matrix where = —=~ is given by 

VuiiUji 



^fc(fc-i)/4 jj - [ 

i~l \ 2 ^ 


over the part of the space of the {r,;} for which ||rj^ || is positive definite, and 
0 otherwise. 

18.4 {Continuation) Show that 


k 


= n 

2-2 


■‘■1 


r 

.'i 




^= 0 , 1 , 2 ,... 


and hence that the distribution of Ir*,! is identical with the distribution of 

where the are independent random variables having beta distributions 
Be{\{n — i), \{i — 1)), z = 2,..., A: respectively. 


18.5 Distribution of sample correlation coefficient. If ||z/ij || is the internal 
scatter matrix of a sample of size n from the two-dimensional distribution 
^{{l^i}\ Iktiil), hj = 1» 2, and making use of the fact that the elements of 
have the Wishart distribution W{2,n — 1, li%||), show that the p.d.f. of the 
sample correlation coefficient r = ^ ^ 11^22 can be expressed in the form 


(1 _p)i(«-l>(l _;.2)i(n-4) » + A 


for —1 < r < +1 and 0 otherwise, where p — 0 ^ 12 /^^! 1 ^ 22 * correlation 

coefficient in the population. [Fisher (1915).] 

18.6 {Continuation) Applying the transformation 


r — Ui2l^UiiU22i ^ ■“ ^ i ic§ (^22/^ll)» 


to the p.e. of the Wishart distribution W{2, n — 1, lk,vll) show that the p.d.f. 
of r can also be expressed in the form (due to Hotelling (1953)): 

{n - 2)T{n - 1)(1 - p^)^<^“^>(l - r2)i(n-4)(i ^ pr)-U 2 n^») 

V27rl 

^ (i + ipryrK^ + 0 

•.z /ir(;,-i+/) • 



594 


MATHEMATICAL STATISTICS 


18.7 ^Continuation) Show that the Hotelling form of the probability element 
of r can be written in the form 


(n ~ 2)T(n - 1)(1 - 


VlnTin - i)(l - pr)i<2~-3) 

IJ J dy do^ dr 


where f{x) and^( 2 /) are the p.d.f.’s of the beta distributions Be{^, /i — 1) and 
Be(\^ i) respectively. Hence, by making the transformation 

1 — p 2 V 

r ^ P + —-IT- w, re = - , 2 / = w 
Vn n 


show that the limiting distribution of 


u 


(r - p)y/n 

1 -p* 


as/i -► 00 is N{0y 1). 


18.8 {Continuation) Show, by making use of 9.3.1, that if 


and 



the limiting distribution of (2 — as w -> 00 is N{0, 1). 

18.9 Confidence ellipsoids for vector of means of a normal distribution. 

If ,,, ,Xic) is the vector of means and ||Wjjl| the internal scatter matrix of a 
sample of size /i, n > k, from the ^-dimensional distribution Ik,,-ID, 

show that 

k 

n T u*KtH - -»,)<(! - 2y)/Zy 

is a 100y% confidence region (ellipsoid) for the vector of population means 
(a*i, • •., A^fc) where z is the upper 100y% point of the beta distribution 
Be{\{n -k\\k), [Hotelling (1931).] 

18.10 Equivalence of Hotelling's 7* with likelihood ratio test that a normal 

distribution ||<t,.,||) has a given vector of means. Let Jf(a>; O) be the 

statistical hypothesis in which Q is the \k(Jc + 3) dimensional parameter space 
for which //i,..., are real and ll(7,,|| is positive definite and a> is the subset of n 
for which =» a*i, ..., A** =* Given a sample of size n from ^({a<,}; lk,,|l) 
with vector of means («i,..., and internal scatter matrix show that the 
likelihood ratio A for testing Jtf is 

X 

where Ri is given by (18.4.1). Hence (as well as Hotelling’s 7*) is equivalent 
to X for testing and its distribution when is true is the beta distribution 
Be{\{nk),\k). 
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18.11 {Continuation), Equivalence of Hotelling^s and Mahalanobis^ D* 
with likelihood ratio test that two normal distributions with equal covariance 
matrices also have equal vectors of means. 

Let and \\u[^/\\ be the internal scatter matrices of samples of sizes 
and 71a respectively, from lk<J>||) and ||ag)||), /fi,7ia>A:, 

iy J ~ If , , , y k. 

Let be the internal scatter matrix of the two samples pooled together into 
a single sample. Let ^(co; O) be the statistical hypothesis in which O is the 


Hence R[ (as well as Hotelling’s and Mahalanobis’ for two samples) is 
equivalent to A for testing ^ and its distribution when Jf is true in the beta 
distribution Be(l{ni + /I 2 — A: — 1), \k), 

18.12 Generalize Problem 18.11 to the case of j + I samples of sizes 
Til,..., and show that the likelihood ratio A for testing the hypothesis ^ 
that all samples come from normal distributions having equal vectors of means 
given that they come from normal distributions having equal covariance matrices 
is equivalent to given in (18.5.3) with |la,-^|| as the sum of the internal scatter 

matrices of the j + 1 samples Woij + as the internal scatter of all samples 

pooled together as a single sample and /n — /ii + • • • + — 5 — 1. 

Hence show that the distribution of R^ in this case if ^ is true is given by 18.5.1 
with the value of m just given. 


\k{k -f 5)-dimensional space of the parameters ||cy,-y||, where lla\y|| =» 

Ikjfll =iK,ii. Let CO be the \k{k + 3)-dimensional subspace in O for which 
^ = 1,..., A:. Show that the likelihood ratio A for testing is 


mr II = 


18.13 Testing the hypothesis that the covariance matrices of two normal 
distributions are equal. 

In Problem 18.11 consider the hypothesis ^(co; fl) where remains unchanged 
but where co is the subspace in Q for which \\(j\]^\\ = Show that the 

likelihood ratio A for ^ is 


A = 






.ji2) 

Uij 


j 


n 


kn 


where /i = /ii + n^y and that if ^ is true 


^(AO 




."jI !*»■ 


r - 0,1. 2. 


r r ^”1 ~ ‘ + r j 


k 

n 


18.14 Model I analysis of variancefor vectors in two-factor experimental design. 

In an experimental layout of r rows Rx . Rr and s columns Cj,..., C, 

suppose we have a Ar-dimensional vector random variable {x \^,. •., 
associated with the cell formed by the fth row and rjth column, f =* 1 ,..., r, 
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17 » 1 ,..., 5 . Suppose these rs vector random variables are independent, such 
that . Xf,^) has the Ar-dimensional normal distribution 

where ^ 0 , 1 * 1 ,..., A. Now let ^(co; 0) be the statistical 

hypothesis in which Q is the space of the A(r + ^ — 1 ) + \k{k + 1 ) independent 
parameters involved in the rs distributions mentioned above, and cu is the 
{ks + \k(Jc + l))-dimensional subspace of Ci for which the “row effects” are all 
zero, that is, 

“•*•** Mir. =*0, 1 = 1. k. 

Let the quantities x,^, m, W|., m.^ defined in (10.6.3), when computed 

from Xi^, f = 1,..., r, 17 = 1 ,..., be designated by x^^„ nti, 
respectively. Let 

Sif.. =“ 2 - 'Wi- "ite. - -ntj - 

Show that the likelihood ratio A for testing ^ is 


where 


A = 

\SuJ 

\Sij„ + Si^.Q\ 


and that if ^ is true (that is, if * • • = = 0, / = 1,...»A) the distri¬ 

bution of L is identical with that of in 18.5.1 with m and s replaced by 
(r — 1)(5 — 1) and (r — 1) respectively. 


18.15 Distribution of sum of squares of least squares residuals. Suppose 
(a?!^,..., Xj^, f = I,..., /i), where n > A, is a sample from the A-dimensional 
^stribution ^({a^,}; ||) and let \\uij || be the internal scatter matrix of this sample. 

For a fixed sample show that the minimum of ^ . 

f-i 

- with respect to /3i,..., jS*. is lUi^Hlu^J, ^7 = 1.A, y, w = 2,..., A. 

By using methods similar to those used in Section 18.4 for finding moments, show 
that 



where crii is the element in the first row and first column of From this 

sequence of moments deduce that 

o^^u,,\l\u,J 

has the chi-square distribution C(n — A). 

18.16 (Continuation) Sampling distribution of multiple correlation coefficient. 
If the minimizing values of j?i,..., are ..., Pjc show that the correlation 
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coefficient R in the sample between and jSj + + • • • + f = 1,...,« 

is given by 


1 - 


\Uii\ 


Let 

Show that 
V<r, 0 ) = (1 + 2 <r] 


-•'[(oJ'""'']- 

•jje)-Kn-l)(J„ll +0)-r_A_ 


) 




li 

in - 

1 ^ 

-* 

—+»• 

) 


i)r| 

<n - 1\ 

r| 


^ 2 j 

2 ) 




Noting that 
^(1 - R^y 
show that 


'2 

i=0 


^(1 - R^y = 


J »0O ^00 

vir, 

0 Jo 

(1 - p2)‘("-»> ^ 


+ +(r)dh--dSr, r=0,l,2 . 

p2i p' " * • • I T-i I ~ ^ 




where p* is the squared multiple correlation coefficient between and ^ 2 ,.,, ,xj^ 
in the population given by 

1 - p* = ‘ 




From this value of the rth moment of 1 — show that the p.d.f. of is 


(1 - 

■i)(l _ 


2)(/{2)i(fc-3) ^ {f^R^y^ 



^ 2 J 


fn -fc'l 

^ 2 J 

1 <-• ,! r( 



for 0 < < I, and 0 otherwise, [Fisher (1928b).] 

18.17 {Continuation) Distribution of scatter of residuals. Consider the 
residuals 
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U-i I 

of these residuals. Show by methods similar to those of Section 18.4 that the 
minimum of this scatter with respect to the P's is 

\uij\l\uvw\* /,y = 1 ,..., A:, r, w = j + 1 . k 

and if the sample ..., 5 * 1 ,..., n) comes from the normal distribution 

NdtjiiY. Ik,,II) that 



Hence, verify that the distribution of \Uii\|\u^^J is identical with that of 





where , 2 , are independent random variables having gamma distri¬ 

butions G{\in — k\ G(i(n — A: + 1)),..., G(iin — A: + 5 — 1)) respectively. 

18.18 Principal components {eigenvalues) of a k-dimensional probability 
distribution. The principal components of a A-dimensional probability distri¬ 
bution with covariance matrix 1 |(t„|| are the eigenvalues ..., A*, of ||(t„|| 
(that is, the roots of the equation |<t„ — A^,,| = 0 ) where 4 - 00 > A^ > • • • > A*. 
> 0. Show that A^ -h * * • + A;j. = <Th -f • • • + and A^Ag • • • A;^ = |(T„|. 
The unit eigenvector ..., corresponding to an eigenvalue Ap which is 
distinct from all other eigenvalues is the unit vector satisfying 

k 

- K^ii)Civ = 0 , 1 = 1 . k. 


If the eigenvalues A^,..., A^^ are all distinct, show that the unit eigenvectors 
corresponding to these eigenvalues are mutually orthogonal, and hence that if 
the axes are rotated so that the new axes have directions given by the eigen¬ 
vectors (cji,.,., Cfci),..., (cijfc,,,., Cfcfc) respectively, the probability distribution 
in the new A:-dimensional space has covariance matrix lk,,A,||. 

18.19 {Continuation) Suppose a A:-dimensional probability distribution has 
covariance matrix ||(t ,,|1 where a,, ^ a\i = 1,..., Ar and <t„ ~ pa\ i ^ 1, 
..., At. Show that ^ *= < 7*[1 + {k — l)p] and Ag * • • • = Aj^ * (P{i — p) and 
that the unit eigenvector {cn ,..., associated with A^ is given by 


^11 


Ckl 


Vk 


18.20 Test of independence of two sets of variables. Let ||i/„|| be the internal 
scatter matrix of a sample of size n > k from the A:-dimensional distribution 
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Let llwpgil ^ = 1,..., 5 be the internal scatter matrix of the 
sample using only the firsts variables, and II, y, w = j + 1,..A:, the internal 
scatter matrix of the sample using only the last t (t = k — s, s <, t) variables. 
Let Q) be the statistical hypothesis in which O is the ikik + 3)-dimensional 
parameter space in which ..., are real numbers and ||rT,^|l is positive 
definite, whereas o) is the {\k{k + 3) — .s/)-dimensiona] subset of Ct for which 


= 0, /? = 1,..., j, w = 5 + 1,.. ., 


Show that the Neyman-Pearson likelihood ratio A for testing is given by 


where 


A =Ll” 



Using methods similar to those used in Section 18.4(fl) show that 


^(LO = 


8 





n — 


r 


+ r 

)r, 

1 

1 

s; 



r = 0, 1, 2, .. . if is true. 

Verify for 5 = 1, / = A — 1 that L has the beta distribution Be{\{n — A), 
\{k — 0) and for 5 = 2, / = A — 2 that VZ has the beta distribution Be{n — A, 
A - 2). [Wilks (1935).] 


18.21 Test for sphericity of a normal distribution Let ||w,;, || be the internal 
scatter matrix of a sample of size n from the A-dimensional distribution 
Iktill). Let 12) be the statistical hypothesis in which Q is the 

\k(k + 3)-dimensional parameter space in which the {//J are real and ||(Tjj|| is 
positive definite, and w is the (A + 1 )-dimensional subset of 12 in which 
W^ijW = where is the Kronecker 6, and ; • 0. Show that the Neyman- 

Pearson likelihood ratio A for testing is given by 


where 


A - Llw 

^ , « = ^ («u + •••+«*») 


and hence that L is equivalent to A as a test for Furthermore, show that if 
is true 


J 


k’"T 




i 


r = 0, 1, 2,. .., 
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Verify that for A = 2, Vl has the beta distribution Be(n — 2; 1). [Mauchly 
(1940).] 

18.22 (Continuation) Test of the statistical hypothesis that a k-dimensional 
normal distribution is symmetric in the k variables. Let J^*(w ; O) be the statistical 
hypothesis in which Cl is defined as in the preceding problem but co is the (k H- 3)- 
dimensional subset of O in which > 0, / = 1 ,..., A and 

Oij « = pa\ I < p < 1. 

Show that the Neyman-Pearson likelihood ratio A for is given by 

X = 

where 

r ♦ \t^ij\ _ 

(u — u*)^\u + (A — l)u*) 


and M * -i- (un 4- • • • + w**), «* == hence L* is equivalent 

A A(A — 1) 


to A for testing 
Furthermore, show that if is true, 


^(L*y 


J *00 /•« 

••• y<r. 
0 Jo 


fl + • • • + Srl»-V> %+---+Vh)d(i--- •••<*;» 


= (A - I)*-**^!* 


- 1)(A; - l) j 


where 


r = 0,1, 2,... 


■( 


in - m - 1 ) 


k 

n 


+ rik 


\ 


fei 


y>ir,p,q) = 

« 2**- n — 

... 


and 




-i(t-l)((n-l) + 2r] 


• [B + 2 y]-«"-l+ 2 r) 


,S = 


<»*(1 - #>) ’ oHl +(k- l)rf ■ 
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Verify th^for A: = 2, L* has the beta distribution Be{\{n — 2), i) and for 
A: = 3, Vl* has the beta distribution Bein - 2, 1). [Wilks (1946).] 

[For generalization of the test L* to testing ||cTi,||) for symmetry in 

variables within blocks see Votaw (1948).] 

18.23 Noncentral distribution, If» in section 18.4(fl), the sample comes from 

ll^Tijll) where (|Mi, ,.. y/^ jc) (/<f, ..., show that as defined in 

(18.4.5) has p.d.f. 

r 

_ !)]-}» [2(71 -1)J \2 7 

in - l)‘*r (^ 4 ^) r[1 + r/(« - !)]■ 

where 

k 

(52 = w 2 (Mi - Mi)(Mj “ Mj)‘ 

[Hsu (1938).] 

18.24 Generating moments of the scatter \Vij\ of a sample from Iktjll) 

without using Wishart distrihution\}Ni\]fisi\91)A)\. Suppose a sample . , ., 

^ = 1, . .., /i), « > A: is from a A:-dimensional normal distribution A/^({iMj; ||crj-, ||). 
Let l|y,-,|| be the scatter matrix about the sample mean (//j,..., pf). 

Show (without the use of the Wishart distribution) that if 2r < w 4- 1 — A: 


0 “ I * * I ^ "”2 ^ IX ^^*3> 

J — ao J — ao L \ 3 ? — 1 i J = 1 / J i,P 
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Admissible statistical decision functions, 
504 

Almost certain convergence, 106 
Amount of information, 353 
Analysis of variance, Model I, 297-305 
examples, see Experimental design 
Model II, 308-313 
most general form of, for Model I, 
407 

multidimensional, 561-564 
table, 301, 310 

A posteriori probability, 509 
A priori distribution, 401, 508 
Arc sine law, 471 

Asymptotic efficiency of an estimator, 
363, 380, 381 

Asymptotic normality of a distribution, 
256 

Asymptotically shortest confidence in¬ 
tervals, 374-376 

Asymptotically unbiased tests, definition 
of, 414 

equivalent, 415 
Autocorrelation function, 516 
Autocovariance, see Covariance func¬ 
tion of time series 
Autoregressive time series, 537 
Average outgoing quality limit of a 
sampling plan, 398 

Average sample number, of a Cartesian 
sequential test, 480 
of a general sequential test, 476 
of a probability ratio sequential 
test, 489-490 


Bayes solutions of statistical decision 
problem, 508-511 
Behrens-Fisher problem, 371 
Bertrand’s ballot problem, 470 
Beta distribution, definition of, 173 
mean of, 174 
moments of, 174 

relation between gamma distribution 
and, 174 

relation between Snedecor distribu¬ 
tion and, 187 

relation between Student distribution 
and, 186 
variance of, 174 
Beta function, 174 

Between-strata component of variance, 
314 

Bias, of estimators, 354 
of a statistical test, 395 
Binomial distribution, asymptotic nor¬ 
mality of, for large number of 
trials, 257 

asymptotically shortest confidence in¬ 
terval for parameter in, 391 
characteristic function of, 137 
confidence interval for parameter in, 
369 

definition of, 137 

distribution of sum (and mean) of 
sample from, 206 

efficiency of sample mean for esti¬ 
mating parameter in, 389 
inverse sine transformation for large 
samples from, 274, 391 
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Binomial distribution, maximum like¬ 
lihood estimator for parameter in, 
391 

mean of, 137 

probability ratio sequential test for, 
494-496 

reproductivity of, 137 
sufficiency of sample mean for esti¬ 
mating parameter in, 389 
variance of, 137 

Binomial distributions, k-sample prob¬ 
lem for, 423 

likelihood ratio test for equality of 
parameters in several, 423 
two-sample problem for, 423 
Binomial waiting-time distribution, 
characteristic function of, 144 
definition of, 144 

logarithmic transformation for large 
samples from, 274 
mean of, 144 
variance of, 144 

Bivariate, cumulative distribution func¬ 
tion, 41 

probability density function, 46 
probability function, 44 
Bivariate normal distribution, charac¬ 
teristic function of, 161 
conditional distribution from, 163 
correlation coefficient between mar¬ 
ginal c.d.f. transforms of variables 
in, 188 

correlation coefficient between one 
variable and the marginal c.d.f. 
transform of the other in, 246 
covariance matrix of, 160 
definition of, 158 

distribution of correlation coefficient 
in samples from, 593-594 
distribution of ratio of random vari¬ 
ables having, 188 
marginal distributions from, 162 
means of, 160 

probability density function of, 161 
regression functions from, 163 
Blackwell-Rao theorem for obtaining 
improved estimators from suffi¬ 
cient statistics, 357 


Block frequencies, definition of, 443 
distribution of, 443 

Block frequency counts, definition of, 
444 

distribution of, 445 
moments of, 445 
Bloodtesting problem, 152 
Boltzmann’s H-function, 409 
Boolean field, definition of, 8 
generation of, 8 
Borel cylinder set, 97 
Borel field, definition of, 8 
generation of, 9 
in k-space, 10 
in 98 
minimal, 11 
on the real line, 9 
Branching process, 131-132 

Card-matching problem, 154 
Cartesian product, of probability 
spaces, 19 

of sample spaces, 16 
of sets, 16 

Cartesian sequential test, as a nonpara- 
metric sequential test, 482 
average sample number of, 480 
definition of, 479 
for exponential distribution, 498 
optimum construction for, 481, 499 
r-fold, 481-482 

/•-fold for exponential distribution, 
499 

Canonical correhtion, 587-592 
Canonical correlation coefficients, defi¬ 
nition of, 588 
distribution of, 591 
Cauchy distribution, definition of, 130 
distribution of mean of samples from, 
130 

efficiency of sample median for esti¬ 
mation of location parameter of, 
391 

failure of convergence in probability 
of mean of sample from, 256 
Cell frequencies, definition of, 431 
distribution of, 431 

Cell frequency counts, definition of, 433 
distribution of, 433 
moments of, 433-434 
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Central limit theorem, 257-258 
Change of variable in a probability ele¬ 
ment, 55 

Characteristic function, of binomial dis¬ 
tribution, 137 

of bivariate normal distribution, 161 
of chi-square distribution, 183 
of gamma distribution, 171 
of independent random variables, 120 
of linear function of random vari¬ 
ables, 121 

of multinomial distribution, 139 
of multivariate normal distribution, 
168 

of normal distribution, 157 
of a random variable, 113 
of a set, 395 

of a vector random variable, 119 
Chebychev inequality, 75, 255 
multidimensional, 92, 112, 274 
Chi-square distribution, approximate 
normality of, for large number of 
degrees of freedom, 189 
characteristic function of, 183 
definition of, 183 
degrees of freedom of, 183 
mean of, 183 
moments of, 183 
noncentral, 247 

relation between gamma distribution 
and, 183 

reproductivity of, 183 
variance of, 183 

Chi-square test, for independence in a 
contingency table, 424 
for a multinomial distribution, 425 
Choice behavior model, likelihood ratio 
test for, 426-427 
Classes of sets. Boolean, 8 
Borel, 8 

completely additive, 8 
finitely additive, 8 
sigma-algebra, 8 

Class-size distributions, case of block 
frequency counts, 445 
case of cell frequency counts, 433 
the hypergeometric case, 153-154 
the multinomial case, 153 
Cochran’s theorem for a sample from 
a normal distribution, 212-214 
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Coefficient of correlation, see Correlation 
coefficient 

Coefficient of variation, 74 
Coin-tossing, long leads in, 471 
Column effects, in Latin square experi¬ 
mental designs, 304 
in three-way experimental designs, 
302 

in two-way experimental designs, 
297-298 

Component of variance, between-strata, 
314 

within-strata, 314 

Components of variance, see Variance 
components 

Composite statistical hypothesis, defini¬ 
tion of, 395 
nonparametric, 429 
Wald’s reduction of, to simple statisti¬ 
cal hypothesis, 401-402 
Conditional cumulative distribution 
function, 60, 61, 65 
Conditional probability, 24 
Conditional random variable, 61-66 
mean of, 84 

probability density function of, 64, 

66, 68 

probability function of, 62, 66, 68 
variance of, 84 

Conditional random variables, continu¬ 
ous, 63, 66, 68 
definition of, 61 
discrete, 62, 66, 68 

Confidence band for a continuous c.d.f., 
339-341 

Confidence coefficient, 282, 366 
Confidence contours, as nonparametric 
tests, 438-441 

for a continuous c.d.f., 336-339 
Confidence interval, asymptotic, from 
large samples, 372 

definition of, general case, 365-366 
for difference of means of two normal 
distributions with equal variances, 
324 

for difference of location parameters 
of two continuous distributions, 
469-470 
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Confidence inteival, for mean of a nor¬ 
mal distribution, 282 
for mean of a Poisson distribution, 
369 

for median of a continuous c.d.f., 
330, 342 

for median of a finite population, 
342-343 

for median of a second sample from 
order statistics of a first sample, 
342 

for parameter of a binomial distribu¬ 
tion, 369 

for parameter of a hypergeometric 
distribution, 369 

for the (n -j- l)st observation from 
a normal distribution, 328 
for range of a rectangular distribu¬ 
tion, 390 

for variance of a normal distribution, 
282-283 

of fixed length for mean of a normal 
distribution, 497-498 

Confidence intervals, asymptotically 
shortest, 374-376 

construction of, from samples from 
continuous c.d.f.’s, 366-368 
construction of, from samples from 
discrete distributions, 368-369 
for main effects in experimental de¬ 
signs, 300 

for main effects in one-factor experi¬ 
mental design, 325-326 
for quantiles, 329-332 
for quantiles in finite populations, 333 
for quantile intervals, 332 
for regression coefficients in normal 
regression theory, 289 
simultaneous, see Simultaneous con¬ 
fidence intervals 

Confidence limits, see Confidence inter¬ 
vals 

Confidence regions, asymptotically 
equivalent, 385 

asymptotically smallest, 384-388 
definition of, general case, 381-382 
for mean and variance of a normal 
distribution, 383 

for parameters of a multinomial dis- 
tributioa, 388-389 


Confidence regions, for regression co¬ 
efficients in normal regression 
theory, 290, 324 

for vectors of main effects in experi¬ 
mental designs, 300, 325-326 
for vector of means of a multivariate 
normal distribution, 594 
Consistency of a statistical test, 396 
Consistent estimator, definition of, 351 
multidimensional, 380 
Contagious distribution, 152 
Contingency table, chi-square test of in¬ 
dependence in, 424 
likelihood ratio test for independence 
in two-way, 423 

likelihood ratio test for independence 
in three-way, 424 

Continuous cumulative distribution 
function, confidence contour for, 
336-339 

confidence band for, 339-341 
definition of, 36, 45, 52 
Continuous random variable, definition 
of, 36, 45 

probability density function of, 37, 
47, 52 

probability element of, 37, 46 
Continuous waiting-time distribution, 
172-173 

Consumer’s risk, 397 
Convergence, almost certain, 106 
in distribution, 100, 103 
in probability, 99, 103, 105 
in the mean, 100 

of functions of components in sto¬ 
chastic processes, 102-105 
of maximum likelihood estimators, 
359-360, 379-380 
of sample mean in probability, 254 
of vector of sample means, 274 
set, 106 
stochastic, 99 
with probability one, 106 
Convolution of distribution functions, 
204 

Correlation coefficient, definition of, 78 
distribution of, in samples from a 
normal distribution, 593-594 
Fisher’s transformation for, 276 
multiple, 91, 95 
partial, 94 
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Correlation coefficient, sampling dis¬ 
tribution of multiple, 596-597 
Correlation coefficients, canonical, 588 
distribution of canonical, 591 
distribution of matrix of, 592-593 
Correlation ratio, definition of, 86 
linear, 88, 91 
multiple, 86 

Covariance, between two linear func¬ 
tions of random variables, 83 
definition of, 78 

Covariance function of time series, defi¬ 
nition of, 516 
estimator for, 522-523 
examples, 537 

spectral representation of, 517, 520 
Covariance matrices, test for equality 
of, in two multivariate normal dis¬ 
tributions, 595 

Covariance matrix, definition of, 80 
inverse of, 80 
of finite population, 222 
of hypergeometric distribution, 136 
of linear functions of random vari¬ 
ables, 83 

of likelihood estimators of parame¬ 
ters, 380-381 

of multinomial distribution, 139 
of normal distribution, 160, 166 
of sample mean and median in large 
samples from a normal distribu¬ 
tion, 275 

Coverages, covariance matrix of, 248 
distribution of sums of, 237-238 
distribution of subsets of, 238 
for normal distribution, 343 
large-sample distribution of sums of, 
269-271 

multidimensional, 239-243 
one-dimensional, 235 
Covering theorem for probabilities, 14 
Characteristic roots of a scatter matrix, 
566 

Characteristic vectors of a scatter ma¬ 
trix, 564-567 
Craps, game of, 27 
Critical region, definition, 395 
similar, 396 
Critical set, 395 
Cumulants, 115 


627 

Cumulative distribution function, abso¬ 
lutely continuous case, 36 
conditional, 60, 61, 65 
confidence band for continuous, 339- 
341 

definition of, the multidimensional 
case, 50 

definition of, the one-dimensional 
case, 33 

definition of, the two-dimensional 
case, 41 

determination of, from characteristic 
functions, 116-120 
empirical, 336 
marginal, 42, 50-51 
of a continuous random variable, 36, 
45, 52 

of a degenerate random variable, 36, 
44, 52 

of a discrete random variable, 34, 
43-44, 51-52 

of independent random variables, 
42-43, 51 

of mixed random variables, 47, 53 
of vector (multidimensional) random 
variables, 41, 50 
reproductivity of, 121 
Cylinder set, Borel, 17, 97 
definition of, 97 

Decision functions, see Statistical de¬ 
cision functions 
Decision space, 503 

Degenerate, cumulative distribution 
function, 36, 44, 52 
random variable, 36, 44, 52 
Degrees of freedom, of chi-square dis¬ 
tribution, 183 

of the Snedecor distribution, 186 
of the Student distribution, 184 
DeMoivre-Laplace theorem, on con¬ 
vergence of binomial distribution 
to normal distribution, 257 
multidimensional version, 259 
Difference of means, confidence inter¬ 
vals for, in normal distributions 
with equal variances, 324 
Differentiation of parametric distribu¬ 
tion functions, 345-346, 348-349 
Dirichlet distribution, conditional ran¬ 
dom variables in, 180 
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Dirichlet distribution, covariance mar 
trix of, 179 
definition of, 177-178 
distribution of sums of random vari¬ 
ables having, 180-181 
examples of, 191, 237-238 
marginal distributions from, 179, 
180-181 

moments of, 179 
ordered, 182 

pfobability density function of, 177 
sums of random variables having, 
180-181 

vector of means of, 179 
Discrete random variable, definition of, 
34, 44, 52 

mass points of, 34, 44, 52 
probability function of, 35, 44, 50 
Discriminant analysis, distribution of 
eigenvalues in, 581-587 
for case of several samples, 576- 
581 

for case of two samples, 573-576 
Distance between samples, Mahala- 
nobis*, 560 
Matusita*s, 252 

Distribution, see Cumulative distribu¬ 
tion function. Random variable 
Double sampling, 472-473 

Edgeworth’s theorem on approximation 
to distribution of sample sum (or 
mean), 262-266 

Efficiency of an estimator, 351, 363, 
378 

Efficient estimator, definition of, 351, 
352 

for multidimensional case, 378 
Eigenvalues, distribution of, associated 
with one scatter matrix, 568-573 
distribution of, associated with two 
scatter matrices, 581-587 
distribution of, in discriminant analy¬ 
sis, 581-587 

of a multivariate probability distribu¬ 
tion, 598 

of a scatter matrix, 566 
Eigenvectors, associated with a pair of 
Katter matrices, 581-587 
in discriminant analysis, 574, 580 


INDEX 

Eigenvectors, of a multivariate proba¬ 
bility distribution, 598 
of a scatter matrix, 567 
Ellipsoidal estimator, see Confidence 
region 

Empirical cumulative distribution func¬ 
tion, 337 

Empty block test for two-sample prob¬ 
lem, 446-452 

Empty cell test for one-sample problem, 
433-438 
Empty set, 3 

Equality of variances, test for, in 
samples from normal distributions, 
423, 425 

Equivalent random variables, 57 
Estimating function, definition of, 372- 
374 

likelihood, 371 
multidimensional, 385 
regular, 373, 385 

Estimator, asymptotic efficiency of, 363, 
380 

bias of, 354 

Blackwell-Rao theorem for improv¬ 
ing, 357 

consistent, 351, 376 
efficient, 351, 378 

for covariance function in time series, 
516 

for mean of normal distribution hav¬ 
ing preassigned variance, 501 
for parameter of Poisson distribution, 
354 

for residual variance in linear regres¬ 
sion, 286-287 

interval, see Confidence interval 
linear, 277 
linear unbiased, 277 
lower bound for variance of, of a 
parameter, 353 
minimum variance linear, 278 
minimum variance linear estimator 
for population mean, 279-280 
minimum variance quadratic, for pop¬ 
ulation variance, 280-281 
multidimensional, 376 
point, 344, 350, 376 
quadratic, 278 

region, see Confidence region 
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Estimator, sufficient, 351, 356 
unbiased in the mean, 350, 376 
unbiased in the median, 350 
variance, 199, 218 
Expectation, see Mean value 
Events, see also Sets 
definition of, 2 
disjoint, 3 
independent, 24 
intersection of (product of), 3 
joint occurrence of, 5 
mutually disjoint, 4 
mutually independent, 24 
probability of, 11 
sequence of, 3 
union of (sum of), 4 
Event point, 1 

Experiment continuation events in se¬ 
quential analysis, 475 
Experimental design, complete two- 
factor, in Model I analysis of 

variance, 297-301 

complete two-factor, in Model II 

analysis of variance, 308-310 
complete three-way, in Model I analy¬ 
sis of variance, 301-304 
complete three-way, in Model II 

analysis of variance, 311-312 
the balanced incomplete two-way, in 
Model II analysis of variance, 

310-311 

Latin square, in Model I analysis of 
variance, 304-305 

Latin square, in Model II analysis of 
variance, 312-313 

likelihood ratio test for zero main 
effects in two-way, 426 
replicated one-factor, 325-326 
replicated two-factor, 326 
Exponential distribution, Cartesian se¬ 
quential test for, 498 
distribution of sample median in large 
samples from, 275 
maximum likelihood estimator for 
parameter in, 390 
order statistics from an, 249 
sufficiency of mean as estimator for 
parameter in, 390 
testing simple hypothesis for, 422 
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Factorials, Stirling’s formula for large, 
175-177 

Fiducial interval, 370 
Fiducial probability, 370 
Finite population, confidence interval 
for median of, 343 
confidence intervals for quantiles in, 
333 

covariance matrix of, 222 
covariance matrix of a sample from, 
217 

covariance matrix of vector of sample 
means in samples from, 222 
distribution of a pair of order statis¬ 
tics in sample from, 252 
distribution of largest element in a 
sample from, 251 

distribution of median of samples 
from, 251 

limiting distribution of sample means 
in large samples from a large, 268 
mean of, 217 

mean of sample covariance matrix in 
sample from, 222 

mean of sample mean in samples 
from, 218 

mean of sample variance in samples 
from, 218 

mean of symmetric functions of 
samples from, 219-221 
minimum variance linear estimator 
for mean of, 280 

minimum variance quadratic esti¬ 
mator for variance of, 280-281 
order statistics from, 243-245 
probability function of a sample 
from, 216 

random sampling from, 214-222 
variance of, 217 •. 

variance of difference of means of 
two samples from, 246 
variance of linear function of sample 
elements in sample from, 246 
variance of sample mean in samples 
from, 218. 

variance of sample sum in samples 
from, 219 

Fisher’s k-statistics, 200-201 
Fractiles of a random variable, 37 
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Gamma distribution, characteristic func¬ 
tion of, 171 
definition of, 171 

distribution of sample sum (mean) in 
samples from, 207 
independence of sample mean and 
certain other functions of samples 
from, 249 

logarithmic transformation for large 
samples from, 391 
maximum likelihood estimator for 
parameter of, 391 
mean of, 171 
moments of, 171 

relation between beta distribution 
and, 174 

renewal process based on, 190 
reproductivity of, 171 
variance of, 171 

Gamma function, 170 
incomplete, 171 

Gaussian distribution, see Normal dis¬ 
tribution 

Gauss-Markov theorem, and weighing 
problems, 286 

on estimators for regression coeffi¬ 
cients, 285 

Generaliz^cd distance, between a sample 
and a population, 557 
between two samples, 560 

Generalized Student distribution, see 
Hotelling’s 

Generalized variance of a multidimen¬ 
sional distribution, 547 

Generating function, card-matching, 154 
factorial-moment, 114, 119 
moment, 114, 119 
probability, 114, 130-132 

Goodness^of-fit criterion, Pearson’s, 
262, 388-389 

Hankel's integral, 118 

Hotelling’s 7^, distribution for one- 
sample case, 556-559 
distribution for two-sample case, 559- 
^60 

equivalence with a likelihood ratio 
test, 594 

equivalence with Mahalanobis’ 

557, 560 


Hotelling’s 7^, noncentral, 601 
Hypergeometric distribution, confidence 
interval for parameter in, 369 
covariance matrix of, in /:-variate 
case, 136 

definition of, the one-variate case, 
134 

definition of, the ^-variate case, 136 
factorial moments of, 135, 136, 150 
mean of, 135, 136 
variance of, 135, 136 
Hypergeometric waiting-time distribu¬ 
tion, definition, 141 
examples of, 28, 153, 480 
factorial moments of, 143 
mean of, 143 
variance of, 143 

Interval estimator, see Confidence inter¬ 
val 

Incomplete beta function, 174 
Incomplete gamma function, 171 
Independence, nonparametric test for, 
467-468 
of events, 24 

of random variables, 42, 51 
of two sets of random variables, test 
for, 598-599 

test for, by rank correlation, 467 
test for, in contingency table, 423,424 
Infinitely divisible random variable, 189 
Information integral, 409 
Information matrix, 418 
Internal scatter matrix of a sample, 543 
Internal scatter of a sample, 543, 546 
Integral, evaluating, by random sam¬ 
pling, 328 

Lebesgue-Stieltjes, 22 
Inverse of a covariance matrix, 80 
Inverse sine transformation for large 
samples from a binomial distribu¬ 
tion, 274, 391 

Jacobian of a transformation, 57, 59 

A:-dimensional content, 546 
k-sample problem, for binomial distri¬ 
butions, 423 

for normal distributions, 425 
for Poisson distributions, 425 
k-statistics, 200-201 
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Khintchine’s theorem on convergence in 
probability of sample mean, 254 
Kolmogorov inequality, 107, 111 
Koopman-Pitman theorem on distribu¬ 
tions admitting sufficient statistics, 
393 

Kurtosis of a distribution, 265 

Large numbers, strong law of, 108 
weak law of, 99, 255 
Large samples, asymptotic distribution 
of order statistics in, 268-274 
asymptotic distribution of sample 
median in, 273 

asymptotic distribution of means in, 
from large finite populations, 268 
asymptotic distribution of sums of 
coverages in, 269-271 
asymptotic expansion of distribution 
of sample sum (or mean) in, 262- 
266 

asymptotic joint distribution of sev¬ 
eral order statistics in, 274 
asymptotic normality of distribution 
of maximum likelihood estimators 
in, 360-362, 380 

asymptotic normality of distribution 
of sample sums (or means) in, 256 
asymptotic normality of distribution 
of score in, 358, 379 
asymptotic normality of distribution 
of vector of sample means in, 258- 
259 

asymptotic normality of Student dis¬ 
tribution in, 189, 275 
distribution of quadratic form in 
sample means in, 261 
inverse sine transformation for, from 
binomial distribution, 274, 391 
limiting distribution of coverage on 
sample range in, 275 
limiting distributions of functions of 
sample means in, 259-261 
limiting distribution of sample mean 
and median in, from a normal dis¬ 
tribution, 275 

limiting form of multinomial distri¬ 
bution in, 262 

logarithmic transformation for, from 
a rectangular distribution, 275 


Large samples, logarithmic transfor¬ 
mation for, from waiting-time dis¬ 
tribution, 274 

square root transformation for, from 
Poisson distribution, 274, 365 
Largest element in a sample, distribu¬ 
tion of, in samples from a finite 
population, 251 

distribution of, in samples from a 
rectangular distribution, 248 
inequality for mean value of, 250 
probability element of, 237 
Largest segment, distribution of, gen¬ 
erated by n points on an interval, 
253 

Latent roots of a scatter matrix, see 
Eigenvalues 

Latent vectors of a scatter matrix, see 
Eigenvectors 

Latin square, definition of, 233 

in experimental designs, 304-305, 
312-313 

Layer (treatment) effects, in Latin 
square experimental designs, 304 
in three-way experimental designs, 
302 

Least squares, see also Normal regres¬ 
sion theory 

linear regression function, 87 
residual variance, 88, 91 
Lebesgue-Stieltjes integral, 22 
Legendre’s duplication formula for 
gamma functions, 175 
Level of significance, 395 
Levy-Cram6r theorem, 122 
Levy’s theorem, for one random vari¬ 
able, 116 

for a vector random variable, 119 
Likelihood estimating function, 371 
Likelihood, definition of, 351 
element, 351 

Likelihood ratio, definition of, 403-404 
large-sample distribution of, for com¬ 
posite hypotheses, 419-420 
large-sample distribution of, for 
simple hypotheses, 410-411 
Likelihood ratio test, see also Binomial 
distribution. Contingency table. Ex¬ 
ponential distribution. Multinomial 
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distribution, Normal distribution, 
Poisson distribution 

Likelihood ratio test, asymptotic power 
of, 413-417 
consistency of, 411-413 
definition of, 403-404 
equivalence of Hotelling’s with a, 
594 

for choice behavior model, 427 
fof the general linear statistical hy¬ 
pothesis, 405-408 

for two-way experimental design, 426 
in normal regression theory, 405-408 
of a composite hypothesis, 419-422 
of a simple hypothesis, 417-419 

Linear dependence of random variables, 
56, 58 

Linear independence of random vari¬ 
ables, 56, 58 

Linear estimator, see also Minimum 
variance linear estimator 
definition of, 277 
unbiased, 278-279 

Linear function of random variables, 
191 

distribution of, in case of normality, 
158, 168-169 
mean of, 82 
variance of, 82 

Linear functions of random variables, 
asymptotic distribution of, in large 
samples from large finite popula¬ 
tions, 266-267 
correlation between, 93, 94 
covariance matrix of, 83 

Linear prediction in time series, 535- 
537 

Linear process in time series, 538 

Linear regression estimators, for coeffi¬ 
cients in, 283-286 
for residual variance in, 286-287 

Linear regression function, definition, 
83, 85 

in bivariate normal distribution, 163 
in multivariate normal distribution, 
170 

least squares, 87 


Linear statistical hypothesis, general, 
405 

Logarithmic transformation for large 
samples, from a gamma distribu¬ 
tion, 391 

from a rectangular distribution, 275 
from a waiting-time distribution, 274 
Loss function, 503 

Lower bound of variance of estimator, 
of parameter, 351-353, 392 
of function of parameter, 390 

Mahalanobis* definition of, for one 
sample, 557 

definition of, for two samples, 560 
relation between, and Hotelling’s 7^, 
557, 560 

Mann-Whitney test, definition of, 460 
consistency of, for testing difference 
of location parameters, 469 
Marginal cumulative distribution func¬ 
tion, 42, 51 

Marginal probability density function, 
46, 52 

Marginal probability function, 44, 52 
Marginal sample spaces, 17 
Markov chains, 98 

Markov’s theorem on estimators for re¬ 
gression coefficients, 285 
Matrix, covariance, 80 
Gramian, 343, 546 
positive definite, 81 
scatter, 543 

Matrix sample, balanced incomplete, 
232-234 

complete second-order, 222-225 
complete third-order, 229-232 
for finite populations, 226-228 
Latin square, 233-234 
Matrix sampling, components of sum of 
squares in, 225, 230, 233, 234 
from finite populations, 226-228 
from infinite populations, 223, 230, 
232 

mean values of components of sum 
of squares in, 225, 230, 233, 234 
Maximum likelihood estimators, asymp¬ 
totic efficiency of, 363, 380 
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Maximum likelihood estimators, asymp< 
totic normality of, in large sam¬ 
ples, 360-362, 380 
convergence of, 359-360 
definition of, 360 

for parameter in exponential distri¬ 
bution, 390 

for parameters in normal regression 
theory, 392 

for parameter of gamma distribution, 
391 

for parameters of multinomial distri¬ 
bution, 392 

functions of, having large sample vari¬ 
ance independent of parameter, 
364-365 

large sample distribution of, for mul¬ 
tidimensional case, 380 
multidimensional case, 379-380 
Mean deviation, 187 
of sample, 250 

efficiency of, as estimator for <r in 
a normal distribution, 391 
Mean value of a random variable, 73- 
74 

Mean of a sample, see Sample mean 
Measure, see Probability measure 
Median, asymptotic efficiency of sample, 
as estimator of mean of normal 
distribution, 364 

confidence interval for, of a distribu¬ 
tion, 331, 342 

confidence interval for, of finite popu¬ 
lation, 343 

confidence interval for, of a second 
sample from order statistics of a 
first sample, 342 

distribution of sample, in samples 
from a finite population, 251 
distribution of sample, in samples 
from a continuous c.d.f., 237 
efficiency of sample, as estimatpr for 
location parameter of Cauchy dis¬ 
tribution, 391 
Minimal Borel field, 11 
Minimal complete class of statistical 
decision functions, 504 


Minimax risk, 505 

Minimax solution of statistical decision 
problem, 505 

Minimum variance estimators, for end 
points of rectangular distribution, 
390 

for mean values of functions of suf¬ 
ficient estimators, 393 
for range of rectangular distribution, 
390 

Minimum variance linear estimator, 
definition of, 278 

of mean of several random variables 
having equal means, variances and 
covariances, 323 

for difference of two population 
means, 323 

for population mean, 280 
from several unbiased estimators of a 
parameter, 323, 327 

Minimum variance quadratic estimator 
for population variance, 280-281 
Moment-generating function, definition 
of, 114, 119 
factorial, 114, 119 
of a vector random variable, 119 
Moment-sequence, determination of dis¬ 
tributions from a, examples, 181, 
245, 532, 553, 562-564, 596-601 
uniqueness of a distribution with a 
given, 125-129 
Moments, 75, 79 
absolute, 76, 79 
absolute central, 76, 79 
central, 76, 79 
factorial, 76, 79 
of a sample, 245 
of a scatter, 545, 547, 552-554 
Sheppard’s corrections for, 328 
Monotone sequence of sets, definition 
of, 8 

probability law for, 13 
Multinomial distribution, asymptotic 
normality of, for large samples, 
262 

characteristic function of, 139 
confidence region for parameters of 
a, 388-389 



SUBJECT INDEX 


634 

Multinomial distribution, covariance 
matrix of estimators for param¬ 
eters in, 139 
definition of, 139 

distribution of vector of sample sums 
(or means) in samples from, 206 
likelihood ratio test for a simple hy¬ 
pothesis concerning a, 425 
marginal distribution from, 151 
Matusita’s inequality for two samples 
from a, 252 

maximum likelihood estimators for 
parameters of, 392 

Pearson’s goodness-of-fit criterion for 
samples from, 262, 425 
reproductivity of, 139 
test of simple hypothesis for, 425 

Multidimensional analysis of variance, 
general Model I, 561-564 
Model 1, for three samples, 564 
Model I, for two-factor experimental 
design, 596 

Multidimensional distribution, see Cu¬ 
mulative distribution function, 
Random variable 

Multidimensional estimator, see Esti¬ 
mator 

Multiple comparisons, see Simultaneous 
confidence intervals 

Multiple correlation coefficient, defini¬ 
tion of, 91 

sampling distribution of, 596-597 

Multiple correlation ratio, 86 

Multiple sampling, 473-474 

Multivariate cumulative distribution 
function, 50 

Multivariate normal distribution, char¬ 
acteristic function of, 168 
conditional distributions from a, 169- 
170, 191 

covariance matrix of a, 164 
definition of, 164 
distribution of exponent in, 64 
distribution of Hotelling’s in 
samples from a, 556-561 
distribution of scatter in samplii from 
a, 554 

distribution of vector of sample sums 
(or mcvins) in samples from, 207 


Multivariate normal distribution, inde¬ 
pendence of means and scatter 
matrix in samples from a, 555- 
556 

Lagrange’s transformation for, 164-» 
165 

marginal distributions from, 168 
maximum likelihood estimators for 
parameters in, 392 

moments of a scatter of a sample 
from a, 552-554 

probability density function of, 164 
regression functions in, 169-170 
rcproductivity of, 190 
spherical, 190 
sphericity test for, 599 
test for symmetry of, in the variables, 
600 

vector of means of, 164 
Wishart distribution of variances and 
covariances in a sample from a, 
551 

Multivariate statistical analysis, 540 
Mutually disjoint events, 4 

Negative binomial distribution, see Bi 
nomial waiting-time distribution 
Neyman-Pearson theorem for most 
powerful test, 398-399 
Noncentral chi-square distribution, 247 
Noncentral Student distribution, 247 
Noncentral Hotelling distribution, 
601 

Nonparametric composite statistical hy¬ 
pothesis, 429 

Nonparametric sequential test, Car¬ 
tesian sequential test as a, 482 
Nonparametric simple statistical hy¬ 
pothesis, 430 

Nonparametric statistical tests, chi- 
square test, 431 

confidence contours as, 438-441 
one-sample empty cell test, 433-435 
the Mann-Whitney test, 459-462 
the Smirnov test, 454-459 
two-sample empty block test, 446- 
452 

two-sample run test, 452-454 
Nonparametric test for independence, 
467-468 
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Normal distribution, see also Bivariate 
normal distribution. Multivariate 
normal distribution 
a problem of order statistics from a, 
253 

asymptotic distribution of median of 
large samples from a, 273 
asymptotic efficiency of median of 
large sample from, 364 
asymptotic normality of mean and 
median of large samples from a, 
275 

bivariate, 158-163 

characteristic function of, 157, 161, 
168 

correlation coefficient between a 
random variable having a, and its 
c.d.f. transform, 246 
Cochran’s theorem for a sample from 
a, 212-214 

confidence interval for mean of, 282 
confidence region of parameters in, 
383 

confidence interval for variance of, 
282-283 

definition of, 156 

distribution of Student / in samples 
from, 211 

distribution of sum (or mean) of 
samples from, 206 

distribution of variance of samples 
from, 208 

distribution of sum of squares in 
samples from, 208 

estimator for mean of, with pre¬ 
assigned variance, 501 
independence of mean and variance 
in samples from, 208-11 
inequality for integral of, 188 
likelihood ratio test that mean of, 
has specified value, 404-405 
likelihood ratio test that variance of, 
has specified value, 425 
Lukacs’ condition for a sample to be 
from a, 250 
mean of, 156 
multivariate, 163-170 
probability density function of, 156, 
161, 164 

probability ratio sequential test for 
mean of, 499-500 


Normal distribution, probability ratio 
sequential test for variance of, 
500-501 

reproductivity of, 158, 161, 190 
standardized form of, 156 
Stein’s confidence interval of fixed 
length for mean of, 497-498 
sufficient statistics for estimating 
parameters in, 390 
uniformly most powerful test for 
mean of, 401 
variance of, 156 

Normal distributions, confidence inter¬ 
val for difference of means of, 324 
distribution of difference of means of 
samples from two, 207 
estimator for difference of means of, 
with preassigned variance, 501 
likelihood ratio test for equality of 
means of several, with equal vari¬ 
ances, 422-423 

likelihood ratio test for equality of 
means of two, with equal variances, 
422 

likelihood ratio test for equality of 
variances of, 423, 425 
problem of, ^-samples from, 425 
Normal noise, definition of, 533 
test for whiteness of, 533-535 
Normal regression theory, confidence 
intervals for regression coefficients 
in, 289 

confidence regions for regression co¬ 
efficients in, 290, 324 
definition of, 288 

estimators for regression coefficients 
in, 247, 288 

in experimental designs, 297-305 
likelihood ratio test in, 405-408 
maximum likelihood estimators for 
parameters in, 392 
test for parallel regression lines in, 
426 ^ 

Null hypothesis, 394 

Observable random variable, 278 
Operating characteristic function, of 
probability ratio sequential test, 
487-489 

of a sampling plan, 398 
of a sequential test, 476 , 
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Operating characteristic function, of a 
statistical test, 395 
Optimum stratified sample, 317 
Ordered Dirichlet distribution, 182 
Order statistics, asymptotic distribution 
of, in large samples, 268-274 
coverages determined by, 235, 239- 
24)0 

definition of, 234 

in samples from finite populations, 
243-245 

in two samples, 442-446 
for a probability density function 
symmetric in the variables, 70 
multidimensional coverages generated 
by, 238-243 

one-dimensional coverages generated 
by, 237-238 

ordering functions in theory of, 238 
probability element of, 236 
sample blocks generated by, 235 
two-sample problems concerning, 

469, 470 

Parallel regression lines, test for, in 
normal regression theory, 426 
Parameter, see Population parameter 
Parameter space, 344 
Parametric cumulative distribution 

function, differentiation of, 345, 
348 

regular, 348-350 
Partial correlation coefficient, 94 
Pascal distribution, see Binomial wait¬ 
ing-time distribution 
Pearson’s goodness-of-fit criterion, 262, 
388-389 

Pearson Type III distribution, see 
Gamma distribution 
Percentile, 37 

Periodogram, analysis, 529-533 
definition of, 530 
of stationary time series, 538-539 
Pivotal point of the scatter of a sample, 
541 

Pivotal function for determining fiducial 
distributions, 370 
Point estimator, 344, 350, 376 
Poisson distribution, asymptotically 
shortest confidence interval for 


parameter in, 391 

Poisson distribution, characteristic func-* 
tion of, 140 

confidence interval for mean of, 369 
cumulative distribution function of, 
as a definite integral, 152 
definition of, 140 

distribution of sum (or mean) of 
samples from, 206 
efficiency of estimator for parameter 
of, 354 

estimator for parameter of, 354 
mean of, 140 

probability ratio sequential test for 
mean of, 501 

problem of k samples from, 425 
reproductivity of, 140 
square root transformation for large 
samples from, 274, 365 
sufficiency of estimator of parameter 
in, 356-357 
variance of, 140 

Poisson distributions, likelihood ratio 
test for equality of parameters in 
two, 424-425 
Poisson process, 192 
Population parameter, definition of, 344 
lower bound of estimator for, 353 
score of a, 353 
Positive definite matrix, 81 
Power of a statistical test, 395 
Principal components of a scatter ma¬ 
trix, 568 

Probability density function, of a con¬ 
ditional random variable, 64, 66, 
68 

of a random variable, 37 

of a vector random variable, 46, 52 

marginal, 46, 52 

Probability element, of a random vari¬ 
able, 37 

of a vector random variable, 46 
transformation (change of variable) 
of, 55, 57, 59 

Probability function, of a conditional 
random variable, 62, 66, 68 
marginal, 44, 52 
of a random variable, 35 
of a vector random variable, 43, 52 
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Probability-generating function, 114 
examples of, 130-132 
Probability measure, covering theorem 
for, 14 

definition of, 11 
extension of, 15 
for stochastic process, 96-99 
Probability of occurrence, of all n 
events (of intersection of events), 
28 

of at least one of n events (of union 
of events), 12, 28 
of m or more of n events, 28, 29 
of exactly m of n events, 28 
Probability ratio sequential test, aver¬ 
age sample number of, 489-490 
boundary constants for, 485 
definition of, 482 
efficiency of, 490-492 
for binomial distribution, 494-496 
for mean of a normal distribution, 

499- 500 

for mean of Poisson distribution, 501 
for variance of a normal distribution, 

500- 501 

operating characteristic function of, 
487-489 

termination of, with probability one, 
483-484 

truncation of, 492-494 
Probability spaces, component, 19 
definition of, 11 
statistical independence of, 19 
Problem of k samples, from binomial 
distributions, 423 
from normal distributions, 425 
from Poisson distributions, 425 
Producer’s risk, 397 
Product space, 16 
Propagation of errors, 190 
Proper linear dependence of random 
variables, 56, 58 

Proportional stratified sample, 316 

Quadratic estimator, 278 
Quantile, definition, 37 
test, 428-430 

Quantile intervals, confidence intervals 
for, 332-333 


Quantiles, confidence intervals for, 
329-332 

confidence intervals for, in finite 
populations, 333 
Quartile, lower, 38 
upper, 38 
Queuing, 193 

Radon-Nikodym theorem, 25 
Random function, 514 
Random intervals, mean of union of, 
250 

Randomization tests, Fisher-Pitman 
test, 464-465 

for sample components, 462-465 
for sample ranks, 465-468 
for two-way experimental design, 465 
Hoeffding’s test of independence, 
467-468 

Mann-Whitney test, 459-462 
rank correlation test, 467 
Random sampling, from finite popu¬ 
lation, 214-222 

from infinite population, 195-198 
unbiased estimator of definite integral 
by, 328 

Random set, 367 

Random variable, absolute central 
moments of, 76, 79 
absolute moments of, 76, 79 
bounded, 20 

central moments of, 76, 79 
coefficient of variation of, 74 
conditional, 25, 61, 66 
continuous, 36, 45, 52 
cumulants of, 115 
definition of, 19, 20 
degenerate, 36, 44, 52 
discrete, 34, 43, 51 
factorial moments of, 76, 79 
integration of, 21-24 
mean of, 73-74 
measurable function of, 53 
moments of, 75, 79 
multidimensional (vector), 19, 20 
observable, 278 
sample space of, 19 
semi-invariants of, 115 
simple, 21 

standard deviation of, 74 
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Random variable, variance of, 74 
vector (multidimensional), 19 

Random variables, conditional, 61-66 
correlation coefficient between two, 
78 

covariance between two, 78 
distribution of product of two, 69 
distribution of ratio of two, 69 
distribution of sum of two, 69 
equivalent, 57 
measurable functions of, 53 
independent, 42, 51 
linearly independent, 56, 58 
linearly dependent, 56, 58 
mixed, 47, 53 
mutually independent, 51 
properties of means of, 73-74 
uncorrelated, 78 

Range, of a random variable, 34 
of a rectangular distribution, 155 
of a sample, 237 

Rank correlaition test for independence, 
467 

Rao-Kendall theorem on moment-se¬ 
quences, 128-129 

Rectangular distribution, confidence in¬ 
terval for range of, 390 
definition of, 155 

distribution of geometric mean of a 
sample from a, 249 
distribution of largest element of a 
sample from a, 248 
distribution of largest gap in a sample 
from, 253 

distribution of mean (or sum) of 
samples from, 204 

distribution of median of samples 
from a, 248 

distribution of order statistics in 
sample from a, 235 
distribution of product of elements of 
sample from, 189 

distribution of range of samples 
from, 248 

efficient estimator for range of, 390 
logarithmic transformation for large 
samples from a, 275 
mean of product of largest elements 
in several samples from a, distribu¬ 
tion of, 249 


Rectangular distribution, minimum 
variance estimators of range and 
midpoint of, 390 

order statistics of a sample from a, 
235 

range of, 155 

sufficient estimator for endpoints of, 
390 

sufficient estimator for range of, 390 
Regression coefficients, confidence in¬ 
tervals for, in normal regression 
theory, 289 

confidence region for, in normal re¬ 
gression theory, 290 
definition of, 83, 85 
estimators for, 246-247, 283-286 
Gauss-Markov theorem on estimators 
for, 285 

Markov theorem on estimators for, 
284 

Regression function, definition of, 83, 
85 

in bivariate normal distribution, 163 
in multivariate normal distribution, 
170 J 

least squares, 87 
linear, 83, 85 

Regular parametric distribution func¬ 
tions, 348-350 
Renewal process, 190-191 
Reproductivity of a distribution, 121 
Residual variance, definition of, 84 
estimator for, in linear regression, 
286-287 

least squares, 88, 91 
Response surface analysis, 297 
Risk function, 503 
Risk vectors, 504 

Row effects, in Latin square experi¬ 
mental designs, 304 
in two-way experimental designs, 298 
in three-way experimental designs, 
302 

Run test, for two-sample problem, 
452-454 

Runs, definition of, 145 
distribution functions of, 146-147 
factorial moments of, 148 
mean of, 148 
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Runs, of successes, 153 
variance, 148 

Sample, covariance matrix, 197 

definition of, from finite population, 
215 

definition of, from infinite population, 
195 

Fisher’s ^-statistics of a, 200-201 
matrix, see Matrix sample 
mean deviation of, 250 
mean of, 196 
moments of a, 245 
semi-invariants of a, 200-201 
symmetric functions of a, 201-203 
variance of a, 199 
Sample blocks, 235, 238-239 
Sample cluster, 541, 545 
Sample mean, as minimum variance 
linear unbiased estimation for 
population mean, 279-280 
asymptotic expansion of distribution 
of, in large samples, 262-266 
asymptotic normality of, in large 
samples, 256 

asymptotic normality of functions of, 
in large samples, 259-260 
characteristic function of, 205 
convergence in probability of, 254- 
255 

definition of, 196 

distribution of, in large samples from 
large finite populations, 266-268 
distribution of, in samples from a 
Cauchy distribution, 130 
distribution of, in samples from a 
normal distribution, 206 
efficiency of an unbiased linear esti¬ 
mator of a population mean rela¬ 
tive to the, 327 

geometric, distribution of, in samples 
from a rectangular distribution, 249 
mean of, in samples from an infinite 
population, 198 

mean of, in samples from a finite 
population, 218 

variance of, in samples from an in¬ 
finite population, 198 
variance of, in samples from a finite 
population, 218 


Sample m^ans, asymptotic distribution 
of Studentized difference of, in 
large samples, 275 
asymptotic normality of vector of, in 
large samples, 258-259 
distribution of difference of, in 
samples from two normal distri¬ 
butions, 207 

distribution of vector of, in samples 
from multivariate normal distribu¬ 
tions, 207 

limiting distribution of quadratic 
form in, for large samples, 261 
probability inequalities for vector of, 
274 

variance of difference of two, from 
finite population, 246 
vector of, 197 

Sample median, asymptotic distribution 
of c.d.f. transform of, 275 
asymptotic distribution of, in large 
samples, 273 

asymptotic efficiency of, in large 
samples from normal distribution, 
364 

definition of, 237 

distribution of, in large samples from 
exponential distribution, 275 
distribution of, in samples from a 
rectangular distribution, 248 
efficiency of, for estimating center of 
Cauchy distribution, 391 
probability element of, 237 
Sample moments, variance of, 245 
Sample point, 1 

Sample range, cumulative distribution 
function of, in samples from a 
continuous c.d.f., 248 
distribution of, in samples from rec¬ 
tangular distribution, 248 
limiting distribution of coverage on, 
in large samples, 275 
mean value of, in samples from a 
continuous c.d.f., 248 
probability element of, 237 
Sample space, Cartesian product, 17 
definition of, 1 
marginal, 17 

of a random variable, 19 
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Sampk sum, asymptotic expansion of 
distribution of, in large samples, 
262-266 

asymptotic normality of, in large 
samples, 256 

characteristic function of, 205 
definition of, 196 

distribution of, in samples from a bi¬ 
nomial distribution, 206 
distribution of, in samples from a 
gamma distribution, 207 
distribution of, in samples from a 
normal distribution, 206 
distribution of, in samples from a 
Poisson distribution, 206 
distribution of, in samples from a 
rectangular distribution, 204 
general distribution of, 203, 205 

Sample sums, distribution of vector of, 
in samples from a multinomial dis¬ 
tribution, 206 

distribution of vector of, in samples 
from a multivariate normal dis¬ 
tribution, 207 
vector of, 197 

Sample variance, definition of, 196 
distribution of, in samples from a 
normal distribution, 208 
mean of, in samples from an infinite 
population, 199 

mean of, in samples from a finite 
population, 218 
variance of, 200 

Sampling without replacement, see 
Sampling from a finite population 

Scatter of a multidimensional distribu¬ 
tion, 547 

Scatter of a sample, distribution of, in 
samples from a multivariate nor¬ 
mal distribution, 554 
internal, 543 

mean value of a, 545, 547 
moments of, in samples from a multi¬ 
variate normal distribution, 552- 
554 

one-dimensional, 541 
two-dimensional, 543 
multidimensional, 546 
pivotal point of, 541, 543, 545 


Scatter matrices, distribution of eigen¬ 
values associated with a pair of, 
581-587 

Scatter matrix of a sample, between- 
sample, 564, 574 
characteristic roots of, 566 
characteristic vectors of, 567 
definition of, 543 

distribution of eigenvectors of, 567 
eigenvalues of, 566 
eigenvectors of, 567 
internal, of a sample, 543, 546 
latent roots of a, 566 
latent vectors of a, 567 
principal components of a, 568 
within-sample, 559, 564, 574 
Scatter of residuals, sampling distribu¬ 
tion of, 597-598 

Score of a parameter, asymptotic nor¬ 
mality of, in large samples, 358, 
379-380 

definition of, 353 

Semi-invariants, of a random variable, 
115 

of a sample, 200-201 
Sequence of events, see Sequence of sets 
Sequence of sets (events), 3 
contracting (decreasing), 7 
countably infinite, 3 
expanding (increasing), 7 
inferior limit of, 7 
limit of, 7 
monotone, 8 
superior limit of, 7 

Sequential analysis, see Sequential esti¬ 
mation, Sequential process. Sequen¬ 
tial test 

Sequential estimation, 496-497 
Sequential process, Cartesian, 479, 498 
definition of, 475 

experiment continuation events in, 
475 

probability ratio, 482 
termination of probability ratio, with 
probability one, 483-484 
Sequential test, see also Cartesian se¬ 
quential test, Probability ratio se¬ 
quential test 

average sample number of, 476 
average sample number in probabil¬ 
ity ratio, 489-490 
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Sequential test, boundary constants for 
probability ratio, 485 
Cartesian, 479-481, 498 
Cartesian, as a nonparametric se¬ 
quential test, 482 
criteria for choosing a, 477-479 
degenerate, 475 

efficiency of probability ratio, 490- 
492 

operating characteristic function of, 
476 

operating characteristic function of 
probability ratio, 487-489 
optimum Cartesian, 481, 499 
probability ratio, for binomial distri¬ 
bution, 494-496 

probability ratio, for mean of nor¬ 
mal distribution, 499-500 
probability ratio, for mean of Poisson 
distribution, 501 

probability ratio, for variance of a 
normal distribution, 500-501 
probability ratio, general, 482 
r-fold Cartesian, 481-482 
strength of a, 478 
structure of a, 474-479 
truncation of probability ratio, 492- 
494 

zones of acceptance, indifference and 
rejection for a, 478 
Serial correlation coefficient, 533-535 
Serial correlation function of time 
series, 516 

Set function, completely additive, 11 
definition of, 11 
Sets, associative law for, 6 

Boolean field (Boolean algebra) of, 8 
Borel field (sigma-algebra) of, 8 
bounded, 31 

Cartesian product of, 16 
commutative law for, 6 
complement of, 4 
completely additive class of, 8 
difference of, 4 
disjoint, 3 

distributive law for, 6 
empty (null), 3 
equal, 3 
fields of, 8 

finitely additive class of, 8 


Seto, intersection of (product of), 3 
mutually disjoint, 4 
outer measure of, 16 
random, 367 
union of (sum of), 4 
Sheppard’s corrections for moments, 
328 

Sign test, 430 
Significance level, 395 
Simple random sampling, from a finite 
population, 215 

from an infinite population, 195 
Simple statistical hypothesis, definition, 
395 

nonparametric, 430 

Simultaneous confidence intervals, a 
probability inequality for, 291 
definition of, 290 

for differences of interactions in 
three-way experimental design, 327 
for differences of main effects in two- 
way experimental design, 326 
for differences of means of several 
normal distributions, 326 
for linear combinations of regression 
coefficients, 325 

for linear combinations of several 
normal variables, 325 
Scheff6’s method, 291-294 
Tukey’s method, 294-297 
Skewness of a distribution, 265 
Slutzky’s theorem in time series, 537- 
538 

Smallest element, probability element 
of, in sample, 237 

Smirnov test for two samples, 454-459 
Snedecor (variance-ratio) distribution, 
chi-square distribution as a limit¬ 
ing form of, 191 
definition of, 186 
degrees of freedom of, 186 
in Model I analysis of variance, 407 
mean of, 187 
moments of^ 187 

probability density function of, 186 
relation between beta distribution 
and, 187 
variance of, 187 

Spectral density function, definition of, 
520 



SUBJECT INDBX 


642 

S^MCtral density function, of a geo¬ 
metrically decreasing covariance 
function, 539 
of a linear process, 538 
of smoothed time series, 537-538 
Sipectral distribution function, delBnition 
of, 520 

estiination oi, 523 
l^pectral mass, 520 

Sj[)e 9 tral rqiresentation of covariance 
function, 517, 520 

Square root transformation, for large 
samples from gamma distribution, 
274 

for large samples from Poisson dis¬ 
tribution, 274, 365 

Standard deviation of a random vari¬ 
able, 74 

Statistic, see also Estimator 
consistency of a, 351, 376 
definition of, 196 
efiiciency of a, 351, 378 
sufficiency of a, 351 
Statistical decision functions, admissible, 
504 

complete class of, 504 
definition of, 502 
minimal class of, 504 
Statistical decision problem, Bayes’ so¬ 
lution of, 508-511 

minimax solution of, 505, 504-508 
Statistical hypothesis, composite, 394- 
395 

definition of, 394-395 
general linear, 405 . 
simple, 394-395 

Statistical independence, of probability 
q>aoes, 19 

of random variables, 42, 51 
Statistical test, see also Nonparametric 
statistical telts, Probability ratio 
tests, and Sequential tests 
consistent, 396 
critical region of, 395 
definition of, 395 

for whiteness of normal noise, 533- 
535 

operating diaracteristic function of, 
395 

power of, 395 


Statistical test, size of a, 396 
unbiased, 395 

uniformly most powerful, 396 
Stirling’s formula for large factorials, 
175-177 

Stochastic convergence, 99 
Stochastic process, birth, 192 
branching, 131-132 
definition of, 96 
finite, 68 

linear process, 538 
Markov, 98 
Poisson, 192 
queuing, 193 
random sampling, 195 
sequential process, 475 
stationary, 515 
strictly stationary, 515 
weakly stationary, 516 
Stratified population, definition of, 313 
estimator for paean of, from general 
stratified sample, 315 
estimator for mean of, from optimum 
stratified sample, 317 
estimator for mean of, from propor¬ 
tional stratified sample, 316 
estimator for mean of by two-stage 
sampling, 320 

estimator from stratified sample for 
mean of, with minimum variance 
at fixed cost, 318 

estimator from two-stage sample for 
mean of, with minimum variance 
at fixed cost, 322-323 
two-stage sampling of, with unknown 
strata sizes, means and variances, 
327 

two-stage sampling of, having to ^y 
strata, 318-321 

Stratified sample, definition of, 314 
estimator of population mean from 
general, 315 

estimator of population mean from 
optimum, 317 

estimator of population mean from 
proportional, 316 
Strong law of large numbers, 108 
Student distribution, asymptotic normal¬ 
ity of, in large san^les, 189,275 
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Student distribution, definition of, 184 
degrees of freedom in, 184 
for a sample from a normal distribu¬ 
tion, 211 

for two samples, 245 
mean of, 185 
moments of, 185 
noncentral, 247 

probability density function of, 184 
variance of, 185 
Studentized range, 294 
Student ratio, as a likelihood ratio test, 
404-405 

definition of, 184, 211 
example of unbiased test, 396-397 
Sufficient statistics, Blackwell-Rao theo¬ 
rem on, 357 
definition of, 351 

factorability criterion for, 354-356 
form of distribution admitting, 393 
multidimensional, 356 
Rao’s theorem on, 393 
Sum of squares, components of, in ma¬ 
trix sampling, 225, 230, 233, 234 
distribution of, in samples from a 
normal distribution, 208 
Symmetric functions, mean of, in 
samples from a finite population, 
219-221 

minimum variance property of, for 
samples, 280-281 
of a sample, mean of, 201-203 
variance of, 202 

Test, see Statistical test 
Time series, auto-covariance, 516 
autoregressive, 537 
covariance function of, 516 
definition of, 514 

estimator for covariance function of, 

522 

estimator for mean of, 522 
estimator for spectral distribution of, 

523 

estimators for coefficients in trigono¬ 
metric, 528-529 

Fisher’s test for periods in, 529-533 
lag covariance, 516 
linear prediction in, 535-537 
linear process in, 538 
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Time series, periodogram analysis of, 
529-533 

serial correlation function of, 516 
Slutzky’s theorem, 537-538 
smoothed, 537-538 
spectral density function of, 520, 538 
spectral distribution function of, 517, 
520 

stationary, 515 
strictly stationary, 515 
variate difference method in, 526-527 
weakly stationary, 516 
white noise, 521, 522 
Tolerance intervals, distribution-free, 
334, 342 

for the problem of two samples, 343 
for normal distribution, 343 
Tolerance limits, distribution-free, 334 
for finite populations, 335 
for normal distribution, 343 
Tolerance regions, 335, 343 
Two-sample problem, distribution-free, 
441-442 

empty block test for, 446-452 
for binomial distributions, 423 
for continuous distributions, 441- 
442 

for normal distributions with identi¬ 
cal variances, 422 
for Poisson distributions, 425 
Hotelling’s T* for, 559-560 
Mann-Whitney test for, 459-462 
Matusita’s inequality for the, 252 
run test for, 452-454 
Smirnov test for, 454-459 
Student’s ratio for the, 245 
Two-stage sampling, confidence interval 
of fixed length of mean or normal 
distribution by, 497-498 
of finite populations with many 
strata, 318-323 

of populations with unknown strata 
sizes, means and variances, 327 
to minimize variance of estimator of 
finite population mean at fixed total 
cost, 318 

variance of fixed size of difference of 
means of two normal distributions 
by, 501 
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Tw^^stage sampling, variance of fbced 
size of mean of normal distribu¬ 
tion by, 501 

Type I error of a statistical test, 395 
II error of a statistical test, 395 

Unbiased estimator, 277 
Unbiased linear estimator, 277-278 
Unbiased quadratic estimator, 278-279 
Uncorrelated random variables, 78 
Uniformly most powerful statistical 
test, definition of, 396 
for mean of normal distribution, 401 

Variance, definition of, 74 
of an estimator of a parameter, lower 
bound of, 353 

of linear ftmction of random vari¬ 
ables, 82 
of a sample, 196 
unbiased estimator for, 199, 218 
Variance components, for balanced in¬ 
complete two-way layout, 233 
estimators for these components, 

311 

for complete two-way layout design, 
223 

estimators for these components, 
309 

for complete three-factor layout, 230 
estimators for these components, 

312 

for Latin square layout design, 234 
estimators for these components, 

313 


Variance-ratio distribution, see Snedecor 
distribution 

Variate difference method for time 
series, 526-527 

Venn diagram, 4, 5 

Waiting-time distributions, binomial, 
143-144 

continuous, 172-173 
hypergepmetric, 142-143 
logarithmic transformation for large 
samples from, 274 

Wald-Blackwell theorem, generaliza¬ 
tion of, 498 

on sum of random variables in se¬ 
quential process, 476 

Weak law of large numbers, 99, 255 

Weighing problems, 286 

White noise, 521 
covariance function of a, 538 
test for, 533-535 

Wilcoxon two-sample test, see Mann- 
Whitney test 

Wishart distribution, characteristic func¬ 
tion of, 554 
definition of, 551 
derivation of, 547-552 
reproductivity of, 554 

Within-sample scatter matrix, 559, 564, 
574 

Within-strata component of variance, 
313 

Yule’s birth process, 192 
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